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Preface 


About Tea Time Numerical Analysis 

Greetings! And thanks for giving Tea Time Numerical Analysis a read. This textbook was born of a desire 
to contribute a viable, completely free, introductory Numerical Analysis textbook for instructors and students 
of mathematics. When this project began (summer 2012), there were traditionally published (very expensive 
hardcover) textbooks, notably the excellent Numerical Analysis by Burden and Faires, which was in its ninth 
edition. As you might guess by the number of editions, this text is a classic. It is one of very few numerical 
analysis textbooks geared for the mathematician, not the scientist or engineer. In fact, I studied from an early 
edition in the mid 1990’s! Also in the summer of 2012 there were a couple of freely available websites, notably the 
popular http://nm.mathforcollege.com/, complete with video lectures. However, no resource I could find included 
a complete, single-pdf downloadable textbook designed for mathematics classes. To be just that is the ultimate 
goal of Tea Time Numerical Analysis. 

The phrase “tea time” is meant to do more than give the book a catchy title. It is meant to describe the general 
nature of the discourse within. Much of the material will be presented as if it were being told to a student during tea 
time at University, but with the benefit of careful planning. There will be no big blue boxes highlighting the main 
points, no stream of examples after a short introduction to a topic, and no theorem. . . proof. . . theorem. . . proof 
structure. Instead, the necessary terms and definitions and theorems and examples will be woven into a more 
conversational style. My hope is that this blend of formal and informal mathematics will be easier to digest, and 
dare I say, students will be more invited to do their reading in this format. 

Those who enjoy a more typical presentation might still find this textbook suits their preference to a large extent. 
There will be a summary of the key concepts at the end of each conversation and a number of the exercises will be 
solved in complete detail in the appendix. So, one can get a closer-to-typical presentation by scanning for theorems 
in the conversations, reading the key concepts, and then skipping to the exercises with solutions. I hope most 
readers won’t choose to do so, but it is an option. In any case, the exercises with solutions will be critical reading 
for most. Learning by example is often the most effective means. After reading a section, or at least scanning 
it, readers are strongly encouraged to skip to the statements of the exercises with solutions (marked by I s ! or '"'), 
contemplate their solutions, solve them if they can, and then turn to the back of the book for full disclosure. The 
hope is that, with their placement in the appendix, readers will be more apt to consider solving the exercises on 
their own before looking at the solutions. 

The topical coverage in Tea Time Numerical Analysis is fairly typical. The book starts with an introductory 
chapter, followed by root finding methods, interpolation (part 1), numerical calculus, interpolation (part 2), and 
the second edition introduces a chapter on differential equations. The first five chapters cover what, at SCSU, 
constitutes a first semester course in numerical analysis. As this book is intended for use as a free download or 
an inexpensive print-on-demand volume, no effort has been made to keep the page count low or to spare copious 
diagrams and colors. In fact, I have taken the inexpensive mode of delivery as liberty to do quite the opposite. I 
have added many passages and diagrams that are not strictly necessary for the study of numerical analysis, but 
are at least peripherally related, and may be of interest to some readers. Most of these passages will be presented 
as digressions, so they will be easy to identify. For example, Taylor’s theorem plays such a central role in the 
subject that not only its statement is presented. Its proof and a bit of history are added as “crumpets”. Of course 
they can be skipped, but are included to provide a more complete understanding of this fundamental theorem of 
numerical analysis. For another example, as a fan of dynamical systems, I found it impossible to refrain from 
including a section on visualizing Newton’s Method. The powerful and beautiful pictures of Newton’s Method as a 
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dynamical system should be eyebrow-raising and question-provoking even if only tangentially important. There are, 
of course, other examples of somewhat less critical content, but each is there to enhance the reader’s understanding 
or appreciation of the subject, even if the material is not strictly necessary for an introductory study of numerical 
analysis. 

Along the way, implementation of the numerical methods in the form of computer code will also be discussed. 
While one could simply ignore the programming sections and exercises and still get something out of this text, it 
is my firm belief that full appreciation for the content can not be achieved without getting ones hands “dirty” by 
doing some programming. It would be nice if readers have had at least some minimal exposure to programming 
whether it be Java, or C, web programming, or just about anything else. But I have made every effort to give 
enough detail so that even those who have never written even a one-line program will be able to participate in this 
part of the study. 

In keeping with the desire to produce a completely free learning experience, GNU Octave was chosen as the 
programming language for this book. GNU Octave (Octave for short) is offered freely to anyone and everyone! It 
is free to download and use. Its source code is free to download and study. And anyone is welcome to modify or 
add to the code if so inclined. As an added bonus, users of the much better-known MATLAB will not be burdened 
by learning a new language. Octave is a MATLAB clone. By design, nearly any program written in MATLAB will 
run in Octave without modification. So, if you have access to MATLAB and would prefer to use it, you may do so 
without worry. I have made considerable effort to ensure that every line of Octave in this book will run verbatim 
under MATLAB. Even with this earnest effort, though, it is possible that some of the code will not run under 
MATLAB. It has only been tested in Octave! If you find any code that does not run in MATLAB, please let me 
know. 

I hope you enjoy your reading of Tea Time Numerical Analysis. It was my pleasure to write it. Feedback is 
always welcome. 


Leon Q. Brin 
brinll@southernct .edu 


How to Get GNU Octave 

Octave is developed by the GNU Project for the GNU operating system, which is most often paired with a Linux 
kernel. At its core, Octave is, therefore, GNU/Linux software. It runs natively on GNU/Linux machines. It must be 
ported (converted somehow) to run on other operating systems like Windows or OS X. Ports (converted programs) 
do exist for these operating systems, but are significantly more complicated to install than native Windows or native 
OS X programs. Nonetheless, the advantage to this approach is the end result which looks and runs very much 
like a native application, desktop shortcut /alias and all. The disadvantage is the somewhat lengthy installation 
procedure with parts that sometimes don’t work together as expected, resulting in a failed installation. 

Windows and Mac users may also install hardware virtualizing software. Such software is freely available as 
native Windows and native OS X software. Then a complete GNU/Linux operating system can be installed inside 
the virtualizer as a so-called virtual machine. Octave can be installed in the virtual machine as a native program. 
With some configuring, the virtual machine can be made to look and feel almost like other Windows or OS X 
apps. The advantage to this approach is that installation is relatively straightforward. The disadvantage is that it 
requires a lot of computing resources. People with an old (slow) machine or a machine with little RAM (memory) 
will likely be disappointed in performance. Octave will be even slower than other programs if it runs at all. 

GNU/Linux can also be installed “side-by-side” with Windows or OS X, creating a dual-boot machine. The 
advantage to this approach is it relieves all of the issues of the other two methods. Octave is installed as a native 
application and all computer resources are dedicated to GNU/Linux so Octave will run as quickly as possible on 
your machine. The primary disadvantage to this approach is that you will have to decide whether to run your 
usual (Windows or OS X) operating system or GNU/Linux every time the computer starts. You will not be able 
to switch between Octave and the apps you are used to running. For example, switching from iTunes to Octave, 
or from Word to Octave and back, is not possible. You get one or the other. A secondary disadvantage is the 
need to repartition the computer’s hard drive (or the need to add an additional hard drive to the machine), making 
the installation process potentially devastating to the machine. A complete backup of your machine is required to 
maintain safety. 

All that may not mean much to you. To see how it translates into advice and step-by-step instructions on 
installing GNU Octave, see this textbook’s companion website, 


http : //lqbrin. github . io/tea-time-numerical/more . html. 
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How to Get the Code 

All the code appearing in the textbook can be downloaded from this textbook’s companion website, 

http : / /lqbrin. github . io/tea-time-numerical/ancillaries . html. 

The code printed within and accompanying Tea Time Numerical Analysis electronically is distributed under the 
GNU Public License (GPL). Details are available at the website. 
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Preliminaries 


1.1 Accuracy 


Measuring Error 


Numerical methods are designed to approximate one thing or another. Sometimes roots, sometimes derivatives 
or definite integrals, or curves, or solutions of differential equations. As numerical methods produce only approx- 
imations to these things, it is important to have some idea how accurate they are. Sometimes accuracy comes 
down to careful algebraic analysis — sometimes careful analysis of the calculus, and often careful analysis of Taylor 
polynomials. But before we can tackle those details, we should discuss just how error and, therefore, accuracy are 
measured. 

There are two basic measurements of accuracy: absolute error and relative error. Suppose that p is the value 
we are approximating, and p is an approximation of p. Then p misses the mark by exactly the quantity p — p, the 
so-called error. Of course, p — p will be negative when p misses low. That is, when the approximation p is less 
than the exact value p. On the other hand, p — p will be positive when p misses high. But generally, we are not 
concerned with whether our approximation is too high or too low. We just want to know how far off it is. Thus, 
we most often talk about the absolute error, \p — p\. You might recognize the expression | p — p\ as the distance 
between p and p , and that’s not a bad way to think about absolute error. 

The absolute error in approximating p = tt by the rational number p = ^ is | ^ — tt\ ss 0.00126. The absolute 
error in approximating n 5 by the rational number 16 5 5 4 25 is 1 ls 54 25 — 7r| « 0.00116. The absolute errors in these 
two approximations are nearly equal. To make the point more transparent, n ss 3.14159 and « 3.14285, while 
7 r 5 « 306.01968 and 16 54 25 « 306.01851. Each approximation begins to differ from its respective exact value in the 
thousandths place. And each is off by only 1 in the thousandths place. 

But there is something more going on here. 7r is near 3 while 7r 5 is near 300. To approximate n accurate to the 
nearest one hundredth requires the approximation to agree with the exact value in only 3 place values — the ones, 
tenths, and hundredths. To approximate 7r 5 accurate to the nearest one hundredth requires the approximation 
to agree with the exact value in 5 place values — the hundreds, tens, ones, tenths, and hundredths. To use more 
scientific language, we say that 22 approximates n accurate to 3 significant digits while 16 54 25 approximates tt 5 
accurate to 5 significant digits. Therein lies the essence of relative errors — weighing the absolute error against the 
magnitude of the number being approximated. This is done by computing the ratio of the error to the exact value. 

Hence, the relative error in approximating 7r by ^ is — — : — ss 4.02(10) 4 while the relative error in approximating 


tt 5 by 


16525 

54 


IS 


l^-^l 


3.81(10) . The relative errors differ by a factor of about 100 (equivalent to about 


two significant digits of accuracy) even though the absolute errors are nearly equal. In general, the relative error in 
approximating p by p is given by ^ ^ . 

\P\ 


Sources of Error 

There are two general categories of error. Algorithmic error and floating-point error. Algorithmic error is any error 
due to the approximation method itself. That is, these errors are unavoidable even if we do exact calculations at 
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every step. Floating-point error is error due to the fact that computers and calculators generally do not do exact 
arithmetic, but rather do floating-point arithmetic. 


Crumpet 1: IEEE Standard 754 


Floating-point values are stored in binary. According to the IEEE Standard 754, which most computers use, the 
mantissa (or significand) is stored using 52 bits, or binary places. Since the leading bit is always assumed to 
be 1 (and, therefore, not actually stored), each floating point number is represented using 53 consecutive binary 
place values. Now let’s consider how 1/7 is represented exactly. In binary, one seventh is equal to 0.001001001 . . . 
because 4 = y^°°, 2^ 3 * = 1 + A + Jp-I- • • • . To see that this is true, remember from calculus that 

I £■ — *l=\ O 04 OlZ 7 


E 2_3i 

i= 1 


00 

EC 3 )’ 

4 — 1 

2“ 3 

1 - 2- 3 

1/8 

7/8 

1 

7' 


But in IEEE Standard 754, | i s chopped to 

1.0010010010010010010010010010010010010010010010010010 x (2)“ 3 


or y/jf ] 2 3l which is exactly 


2573485501354569 

18014398509481984 ' 


The floating point error in calculating 1/7 is, therefore, 


2573485501354569 1 

18014398509481984 ~~ 7 


1 

126100789566373888 


7.93(10)“ 18 . 


References [35, 1 


In floating-point arithmetic, a calculator or computer typically stores its values with about 16 significant digits. 
For example, in a typical computer or calculator (using double precision arithmetic), the number ^ is stored as 
about 0.1428571428571428, while the exact value is 0.1428571428571428.... In the exact value, the pattern of 
142857 repeats without cease, while in the floating point value, the repetition ceases after the third 8. The value 
is chopped to 16 decimal places in the floating-point representation. So the floating point error in calculating 1/7 
is around 5(10) -17 . I say “around” or “about” in this discussion because these claims are not precisely true, but 
the point is made. There is a small error in representing 1/7 as a floating point real number. And the same is true 
about all real numbers save a finite set. 

Yes, there is some error in the floating-point representation of real numbers, but it is always small in comparison 
to the size of the real number being represented. The relative error is around 1CU 17 , so it may seem that the 
consideration of floating-point error is strictly an academic exercise. After all, what’s an error of 7.93(10) -18 among 
friends? Is anyone going to be upset if they are sold a ring that is .14285714285714284921 inches wide when it 
should be .14285714285714285714 inches wide? Clearly not. But it is not only the error in a single calculation (sum, 
difference, product, or quotient) that you should be worried about. Numerical methods require dozens, thousands, 
and even millions of computations. Small errors can be compounded. Try the following experiment. 

Experiment 1 

Use your calculator or computer to calculate the numbers po,pi,P2, ■ ■ ■ ,P7 as prescribed here: 


p 0 = ir 

pi = 10po - 31 
P2 = lOOpi — 41 
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• P?_, = 100p 2 - 59 

• p4 = 100p 3 — 26 

• P 5 = 100p 4 — 53 

• pe = 100p 5 — 58 

• p-j = 100p 6 — 97 

According to your calculator or computer, pi is probably something like one of these: 

0.93116 (Octave) 

.9311599796346854 (Maxima) 

1 (CASIO /x-115ES) 

However, a little algebra will show that P 7 = 100000000000007T — 31415926535897 exactly (which is approximately 
0.932384). Even though p 0 is a very accurate approximation of n, after just a few (carefully selected) computations, 
round-off error has caused p 7 to have only one or two significant digits of accuracy! 

This experiment serves to highlight the most important cause of floating-point error: subtraction of nearly equal 
numbers. We repeatedly subtract numbers whose tens and ones digits agree. Their two leading significant digits 
match. For example, 107T — 31 = 31.415926 . . . — 31. 107r is held accurate to about 16 digits (31.41592653589793) but 
107T — 31 is held accurate to only 14 significant digits (0.41592653589793). Each subsequent subtraction decreases 
the accuracy by two more significant digits. Indeed, p-j is represented with only 2 significant digits. We have 
repeatedly subtracted nearly equal numbers. Each time, some accuracy is lost. The error grows. 

In computations that don’t involve the subtraction of nearly equal quantities, there is the concern of algorithmic 
error. For example, let f(x) = sin x. Then one can prove from the definition of derivative that 

= lim sin(l + h)~ sin(l - h) 
h — ^0 2 h 

Therefore, we should expect, in general, that p(h) = sln ( 1 + /i U sm ( 1 - ,i ) j s a good approximation of /'( 1) for small 
values of h; and that the smaller h is, the better the approximation is. 


Experiment 2 

Using a calculator or computer, compute p(h) for h = 10 -2 , h = 10 -3 , and so on through h = 1CU 7 . Your results 
should be something like this: 


h 

p*(h) 

10" 2 

0.5402933008747335 

10" 3 

0.5403022158176896 

10- 4 

0.5403023049677103 

10" 5 

0.5403023058569989 

nr 6 

0.5403023058958567 

10- 7 

0.5403023056738121 


The second column is labeled p*(h) to indicate that the approximation p(h) is calculated using approximate 
(floating-point) arithmetic, so it is technically an approximation of the approximation. Since /'( 1) = cos(l) « 
.5403023058681398, each approximation is indeed reasonably close to the exact value. Taking a closer look, though, 
there is something more to be said. First, the algorithmic error of p(10 -2 ) is 


|p(10- 2 )-/'(l)| 


50 




9.00(10) -6 



cos(l) 


accurate to three significant digits. That is, if we compute p(10 -2 ) using exact arithmetic, the value still misses 
/'( 1) by about 9(10) -6 . The floating-point error is only how far the computed value of p(10 -2 ), what we have 
labeled p*(10 -2 ) in the table, deviates from the exact value of p(10 -2 ). That is, the floating-point error is given by 
\p* -p\: 


0.5402933008747335 - 50 




1.58(10)~ 17 , 
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as small as one could expect. The absolute error |p*(10 -2 ) — /' ( 1 ) | = |0. 5402933008747335 — cos(l)| is essentially 
all algorithmic. The round-off error is dwarfed by the algorithmic error. The fact that we have used floating-point 
arithmetic is negligible. 

On the other hand, the algorithmic error of p( 10~') is 


|p(10- 7 )-/'(l)| 


5000000 ^sin 
9.00(10) -16 


10000001 \ 
10000000 ) 


— sin 


9999999 \ \ 

10000000 ) ) 


— cos(l) 


accurate to three significant digits. But we should be a little bit worried about the floating-point error since 


sin 


/ 10000001 \ 

v 10000000 ) 


0.8414710388 and sin 


( 9999999 \ 


.8414709307 are nearly equal. We are subtracting numbers 


V 10000000 ) 

whose five leading significant digits match! Indeed, the floating-point error is, again \p* — p|, or 


0.5403023056738121 - 5000000 sin 


/ 10000001 \ 
^ 10000000 




— sin 


/ 9999999 \ 

{ 10000000 J 


1.94(10) 


-10 


Perhaps this error seems small, but it is very large compared to the algorithmic error of about 9(10)~ 16 . So, in 
this case, the error is essentially all due to the fact that we are using floating-point arithmetic! This time, the 
algorithmic error is dwarfed by the round-off error. Luckily, this will not often be the case, and we will be free to 
focus on algorithmic error alone. 


Crumpet 2: Chaos 


Edward Lorenz, a meteorologist at the Massachusetts Institute of Technology, was among the first to recognize 
and study the mathematical phenomenon now called chaos. In the early 1960’s he was busy trying to model 
weather systems in an attempt to improve weather forecasting. As one version of the story goes, he wanted to 
repeat a calculation he had just made. In an effort to save some time, he used the same initial conditions he 
had the first time, only rounded off to three significant digits instead of six. Fully expecting the new calculation 
to be similar to the old, he went out for a cup of coffee and came back to look. To his astonishment, he 
noticed a completely different result! He repeated the procedure several times, each time finding that small 
initial variations led to large long-term variations. Was this a simple case of floating-point error? No. Here’s a 
rather simplified version of what happened. Let f(x) = 4x(l — x) and set po = 1/7. Now compute pi = f(po), 
P 2 = /(pi), P 3 = f{P 2 ), and so on until you have P 40 = /(p 3 g). You should find that p 40 « 0.080685. Now set 
p 0 = 1/7+ 1CU 12 (so we can run the same computation only with an initial value that differs from the original 
by the tiny amount, 1CU 12 ). Compute as before, pi = f(p o), P 2 = f(pi), P 3 = f{P 2 ), and so on until you have 
p 40 = /(P 39 ). This time you should find that p 4 0 ss 0.91909 — a completely different result! If you go back and 
run the two calculations using 100 significant digit arithmetic, you will find that beginning with po = 1/7 leads 
to p 4 o « .080736 while beginning with p 0 = 1/7+ 10~ 12 leads to p 4 0 « 0.91912. In other words, it is not the 
fact that we are using floating-point approximations that makes these two computations turn out drastically 
different. Using 1000 significant digit arithmetic would not change the conclusion, nor would any more precise 
calculation. This is a demonstration of what’s known as sensitivity to initial conditions, a feature of all chaotic 
systems including the weather. Tiny variations at some point lead to vast variations later on. And the “errors” 
are algorithmic. This is the basic principle that makes long-range weather forecasting impossible. In the words 
of Edward Lorenz, “In view of the inevitable inaccuracy and incompleteness of weather observations, precise 
very- long-range forecasting would seem non-existent.” 

References [19, 14, 1] 


Experiment 3 

Let a = 77617 and b = 33096, and compute 

333. 75& 6 + a 2 (lla 2 6 2 -b 6 - 121 6 4 - 2) + 5.5 b 8 + 


1.1. ACCURACY 
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You will probably get a number like — 1.180591620717411(10) 21 even though the exact value is 


54767 

66192 


-.8273960599468214. 


That’s an incredible error! But it’s not because your calculator or computer has any problem calculating each term 
to a reasonable degree of accuracy. Try it. 


333.75 b 6 

a 2 (11 a 2 b 2 - b 6 - 1216 4 - 2) 

5.56 s 

a 

26 


438605750846393161930703831040 

-7917111779274712207494296632228773890 

7917111340668961361101134701524942848 

77617 « 1.172603940053179 
66192 


The reason the calculation is so poor is that nearly equal values are subtracted after each term is calculated. 
a 2 (lla 2 6 2 — 6 6 — 1216 4 — 2) and 5.56 s have opposite signs and match in their greatest 7 significant digits, so 
calculating their sum decreases the accuracy by about 7 significant digits. To make matters worse, a 2 (lla 2 6 2 — 6 6 — 
1216 4 — 2) + 5.56 s = —438605750846393161930703831042, which has the opposite sign of 333. 756 6 and matches it in 
every place value except the ones. That’s 29 digits! So we lose another 29 significant digits of accuracy in adding 
this sum to 333. 756 6 . Doing the calculation exactly, the sum 333.756 s + a 2 (lla 2 6 2 — b 6 — 12 1 6 4 — 2) + 5.56 s is —2. 
But the computation needs to be carried out to 37 significant digits to realize this. Calculation using only about 
16 significant digits, as most calculators and computers do, results in 0 significant digits of accuracy since 36 digits 
of accuracy are lost during the calculation. That’s why you can get a number like — 1.180591620717411(10) 21 for 
your final answer instead of the exact answer ^ — 2 « —.8273960599468214. 

What may be even more surprising is that a simple rearrangement of the expression leads to a completely 
different result. Try computing 


(333.75 - a 2 )b 6 + a 2 (Ua 2 b 2 - 12 1 6 4 - 2) + 5.56 s + 


instead. This time you will likely get a number like 1.172603940053179. Again the result is entirely inaccurate, and 
the reason is the same. This time the individual terms are 


(333.75 - a 2 )6 6 

a 2 (lla 2 6 2 - 1216 4 - 2) 

5.56 s 

a 

26 


-7917110903377385049079188237280149504 

-437291576312021946464244793346 

7917111340668961361101134701524942848 

77617 « 1.172603940053179 
66192 


so the problem persists. We still end up subtracting numbers of nearly equal value. The difference between this 
calculation and the last is rounding. In the first case, rounding caused two of the large numbers to disagree in their 
last significant digit, so they added up to something huge. In the second case, the sum of the first three terms turns 
out to be 0 because the large numbers agree in all significant digits. Note that in the second case, the final result 
is simply the value of " b . 

As these examples show, sometimes floating-point error and sometimes algorithmic error can spoil a calculation. 
In general, it is very difficult to catch floating-point error, though. Algorithmic error is much more accessible. And 
most of the algorithms we will explore are not susceptible to floating point error. In almost all cases, the lion’s 
share of the error will be algorithmic. 

References [28, 18] 


Key Concepts 

p The exact value being approximated. 
p An approximation of the value p. 

Absolute error: | p — p\ is known as the absolute error in using p to approximate the value p. 


Relative error: 


\P~P\ 

\p\ 


is known as the relative error in using p to approximate the value p. 
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Accuracy: We say that p is accurate to n significant digits if the leading n significant digits of p match those of 


p. More precisely, we say that p is accurate to d(p) = log 


p 

P-p 


significant digits. 


Floating-point arithmetic: Arithmetic using numbers represented by a fixed number of significant digits. 

Algorithmic error: Error caused solely by the algorithm or equation involved in the approximation, \p — p\ where 
p is an approximation of p and is computed using exact arithmetic. 

Truncation error: Algorithmic error due to use of a partial sum in place of a series. In this type of error, the tail 
of the series is truncated — thus the name. 


Floating-point error: Error caused solely by the fact that a computation is done using floating-point arithmetic, 
|p* —p | where p* is computed using floating-point arithmetic, p is computed using exact arithmetic, and both 
are computed according to the same formula or algorithm. 

Round-off error: Another name for floating-point error. 


Octave 

The computations of this section can easily be done using Octave. All you need are arithmetic operations and a 
few standard functions like the absolute value and sine and cosine. Luckily, none of these is very difficult using 
Octave. The arithmetic operations are done much like they would be on a calculator. There is but one important 
distinction. Most calculators will accept an expression like 3x and understand that you mean 3 x x, but Octave 
will not. The expression 3x causes a syntax error in Octave. Octave needs you to specify the operation as in 3*x. 

Standard functions like absolute value, sine, and cosine (and many others) have simple abbreviations in Octave. 
They all take one argument, or input. Think function notation and it will become clear how to find the sine or 
absolute value of a number. You need to type the name of the function, a left parenthesis, the argument, and a right 
parenthesis, as in sin(7 . 2) . Some common functions and their abbreviations are listed in Table 1.1. Functions and 


Table 1.1: Some common functions and their Octave abbreviations. 


Function 

Octave 

Function 

Octave 

Function 

Octave 

n\ 

factorial (n) 

sin (a;) 

sin(x) 

cos (a;) 

cos (x) 

M 

abs (x) 

tan (a;) 

tan(x) 

cot (a;) 

cot (x) 

e x 

exp(x) 

sin -1 (a:) 

asin(x) 

cos -1 (a;) 

asin(x) 

In (:r) 

log(x) 

tan -1 (a:) 

atan(x) 

cot -1 (a;) 

acot (x) 

y/x 

sqrt (x) 

sinh(a:) 

sinh(x) 

cosh(a;) 

cosh(x) 

L*J 

f loor (x) 

bl 

ceil (x) 

b x 

b~x 


arithmetic operations can be combined in the obvious way. A few examples from this section appear in Table 1.2. 
There are two thing to observe. First, Octave notation is very much like calculator notation. Second, by default 


Table 1.2: Octave computations of some expressions. 


Expression Octave Result 

W - 7T | abs (22/7-pi) 0.0012645 

I 16525 5 | 

1 ^ 1 abs ( 16525/54-pi~5) /abs (pi~5) 3.8111e-06 

Sin(1 ’ 01 0 02 in( °' 99) (sin(l . 01)-sin(0. 99) )/0 . 02 0.54029 


Octave displays results using 5 significant digits. Don’t be fooled into thinking Octave has only computed those 
five digits of the result, though. In fact, Octave has computed at least 15 digits correctly. And if you want to know 
what they are, use the format (’ long’ ) command. This command only needs to be used once per session. All 
numbers printed after this command is run will be shown with 15 significant digits. For example, 1/7 will produce 
0. 142857142857143 instead of just 0. 14286. If you would like to go back to the default format, use the format () 
command with no arguments. We will discuss finer control over output later. For now, here are a few ways you 
might do experiment 1 using Octave. The only differences are the amount of output and the format of the output. 
The numbers are being calculated exactly the same way and with exactly the same precision. 
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Experiment 1 in Octave, example 1 
octave :1> pO=pi; 

octave :2> pl=10*p0-31; p2=100*pl-41 ; p3=100*p2-59 ; 
octave :3> p4=100*p3-26 ; p5=100*p4-53; p6=100*p5-58; 
octave :4> p7=100*p6-97 
p7 = 0.93116 

Experiment 1 in Octave, example 2 

octave :1> f ormat ( ’ long’ ) 
octave :2> p0=pi 
pO = 3.14159265358979 

octave :3> pl=10*p0-31 
pi = 0.415926535897931 

octave :4> p2=100*pl-41 
p2 = 0.592653589793116 

octave :5> p3=100*p2-59 
p3 = 0.265358979311600 

octave :6> p4=100*p3-26 
p4 = 0.535897931159980 

octave :7> p5=100*p4-53 
p5 = 0.589793115997963 

octave :8> p6=100*p5-58 
p6 = 0.979311599796347 

octave :9> p7=100*p6-97 
p7 = 0.931159979634685 

Experiment 1 in Octave, example 3 

octave :1> 10*pi-31 
ans = 0.41593 

octave :2> 100*ans-41 
ans = 0.59265 

octave :3> 100*ans-59 
ans = 0.26536 

octave :4> 100*ans-26 
ans = 0.53590 

octave :5> 100*ans-53 
ans = 0.58979 

octave :6> 100*ans-58 
ans = 0.97931 

octave :7> 100*ans-97 
ans = 0.93116 

Experiment 3 in Octave 

octave :1> a=77617; 
octave :2> b=33096 ; 
octave :3> t 1=333. 75*b~6 ; 

octave:4> t2=a~2* (ll*a~2*b~2-b~6-121*b~4-2) ; 

octave :5> t3=5.5*b~8; 

octave :6> t4=a/(2*b); 

octave :7> tl+t2+t3+t4 

ans = -1 . 18059162071741e+21 

octave:8> tl=(333.75-a~2)*b~6; 

octave :9> t2=a~2* (ll*a~2*b~2-121*b~4-2) ; 

octave :10> tl+t2+t3+t4 
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ans = 1.17260394005318 

In the end, the way you choose to complete an exercise in Octave will be a matter of preference, and will depend on 
your goal. You should ask yourself questions like the following. How many significant digits do I need? How many 
intermediate results do I need to see? Which ones? The answers to such questions should guide your solution. 
When needed, Octave has abbreviations for most common constants. Table 1.3 shows the three most common. 


Table 1.3: Some Octave constants. 

Constant Octave Result 

~e e 2.7183 

7r pi 3.1416 

i i or j 0 + li 


Exercises 

1. Besides round-off error, how may the accuracy of a nu- 
merical calculation be adversely affected? 

2. Compute the absolute and relative errors in the approx- 
imation of 7r by 3. 

3. Calculate the absolute error in approximating p by p. 

(a) p = 123; p = i^S M 

(b) p= 1; p= .3666 

(c) p = 2 10 ; p = 1000 [S1 

(d) p = 24; p = 48 

(e) ■ p = 7r -7 ; p = 10~ 4 ^ 

(f) Ci p = (0.062847) (0.069234); p = 0.0042 

4. Calculate the relative errors in the approximations of 
question 3. 

5. How many significant digits of accuracy do the approx- 
imations of question 3 have? ' J 

6. Compute the absolute error and relative error in ap- 
proximations of p by p. 

(a) p = y/2, p = 1.414 

(b) p= 10 n ,p = 1400 

(c) p = 9!, p = \/l87r(9/e) 9 


10. ° • Find /( 2) using Octave. 

(a) f{x) = e sin{x) [S1 

(b) f(x) = sin(e x ) 

(c) fix ) = tan -1 (a: — 0.429) ^ 

(d) fix) = x — tan _1 (0.429) 

(e) fix) = 1075! [A) 

(f) fix) = 5\/x w 

11. * • All of these equations are mathematically true. 
Nonetheless, floating point error causes some of them 
to be false according to Octave. Which ones? HINT: 
Use the boolean operator == to check. For example, 
to check if sin(0) = 0, type sin(0)==0 into Octave. 
ans=l means true (the two sides are equal according to 
Octave — no round-off error) and ans=0 means false (the 
two sides are not equal according to Octave — round-off 
error) . 

(a) (2)(12) = 9 2 — 4(9) — 21 

(b) e 31n(2) =8 

(c) ln(10) = ln(5) + ln(2) 

(d) ff( 1+ 2 V ^ ) = 1+ 2 V ^ where g{x) = \/x 2 + x 

(e) (153465/3J = 153465/3 

(f) 3 tt 3 + 7tt 2 - 2tt + 8 = ((3t r + 7 )t r - 2)n + 8 


7. 

8 . 


^ 1103^8 . , 

Calculate — using Octave. 


9801 


* • The number in question 7 is an approximation of 
1 / 7r. Using Octave, find the absolute and relative errors 
in the approximation. 


9. 


o 


Using Octave, calculate 


(a) [ln(234567)J 

(b) e rin(234567)l 

(c) \/ Lsin(e 5 - 2 )J 

(d) -e” 

(e) 4tan _1 (l) 

[cos (3) - Vln(3)J 
[arctan(3) — e 3 ] 


12. Find an approximation p of p with absolute error .001. 

(a) p = tv ^ 

(b) p=\J 5 

(c) p = In (3) ^ 

(d) p=723^ 


(f) p = tan(l. 57079) 


13. Find an approximation p of p with relative error .001 
for each value of p in question 12. 

1 4 . p approximates what value with absolute error .0005? 

(a) p = .2348263818643 [A] 

(b) p = 23.89627345677 
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(c) p = -8.76257664363 

15. Repeat question 14 except with relative error .0005. 

16. p approximates p with absolute error Ay and relative 
error AE. Find p and p. ^ 

17. p approximates p with absolute error ^ and relative 
error Ay. Find p and p. 

18. Suppose p must approximate p with relative error at 
most 10~ 3 . Find the largest interval in which p must 
lie if p = 900. 

19. The number e can be defined by e = y~]^L n (l/n!). 
Compute the absolute error and relative error in the 
following approximations of e: 

'•As 

n = 0 
10 

n = 0 


20. The golden ratio, , is found in nature and in 

mathematics in a variety of places. For example, if F n 
is the n th Fibonacci number, then 

F n+ 1 l + y/5 

hm — — — = — - — 

n — loo r n 2 

Therefore, Fii/Fio may be used as an approximation 
of the golden ratio. Find the relative error in this ap- 
proximation. HINT: The Fibonacci sequence is defined 
by F 0 = 1, F 1 = 1, F n = F n _i + U „_ 2 for n > 2. 


21. Find values for p and p so that the relative and abso- 
lute errors are equal. Make a general statement about 
conditions under which this will happen. A 

22. Find values for p and p so that the relative error is 
greater than the absolute error. Make a general state- 
ment about conditions under which this will happen. 

23. Find values for p and p so that the relative error is 
less than the absolute error. Make a general statement 
about conditions under which this will happen. 

24. Calculate (i) p* using a calculator or computer, (ii) 
the absolute error, \p* — p \ , and (iii) the relative error, 

. Then use the given value of p to compute (iv) 
the algorithmic error, \p—p\ and (v) the round-off error, 

Ip* -pI- 


(a) Let f(x) = * 4 +72; 3 — 63* 2 — 295x+350 and let p = 
f'{- 2)- The value p = 

is a good approximation of p. p is exactly 
8.99999999999999. [A] 


(b) Let f'{x) = e a: sin(10x) and /( 0) = 0 

and let p = /( 1). It can be shown that 
p = Aj- e (si n xo — 10 cos 10) + ^ . Eu- 
ler’s method produces the approximation p = 
A e 1 / 10 sini. Accurate to 28 significant dig- 
its, p is 0.2071647018159241499410798569. 


(c) Let ao = 5+ g v/3 and a n _|_i = 4a n (l — a n ), and con- 
sider p = 05 i. It can be shown that p = 051 = 
m ost direct algorithm for calculating 
051 is to calculate oi, a 2 , 03 , . . . 051 in succession, 
according to the given recursion relation. Use this 
algorithm to compute p* and p. 
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1.2 Taylor Polynomials 

One of the cornerstones of numerical analysis is Taylor’s theorem about which you learned in Calculus. A short 
study bears repeating here, however. 

Theorem 1. Suppose that f(x) has n + 1 derivatives on (a, b), and xq £ (a, &). Then for each x £ (a, 6), there 
exists a £, depending on x, lying strictly between x and Xq such that 


f(x) = f{x o) + Y yj (a: - x 0 y j + (n+1) , {x - x 0 ) 


n+1 


Proof. Let I be the open interval between x and Xq and I be the closure of I. Since I C I C (a, b) and / has n + 1 
derivatives on (a, b), we have that /, /', /", . . . , / ^ are all continuous on I and that /( ra+1 ) exists on I. We now 
define 

F{z ) = f(x) - f(z) - „ (x - z) 3 * S, 


i= i 

_ fc-x o)" +1 


J ! 


and will prove the theorem by showing that F(xo) = +++ — f^ n+1 \0 for some £ £ I. Note that F'{z), a 
telescoping sum, is given by 


F'{z) = -f\z)~Y 

3 = 1 


/ (i+1) 0+ ,, /<*>(*), 

— z )- 7 — — — - t (# — 




(i-i)! 


= -/'(*)- 

/ (n+1) (~) 


/ (ra+1 +) 

n\ 

(x-z) n . 


(x-zr-f(z) 


( _ \ n+1 

Now define g{z) = F(z ) — ( x-x 0 ) F(xq). It is easy to verify that g satisfies the premises of Rolle’s theorem. 
Indeed, g{x o) = g(x) = 0 and the continuity and differentiability criteria are met. By Rolle’s theorem, there exists 
£ e / such that g’{f) = 0 = F'(f) + (n + 1) F(x 0 ). Hence, 

(x - x 0 ) n+1 


p(x 0 ) = -no 


{n+l)(x-f) r 


/ (ra+1) (0 

n!(n + 1) 

/ (n+1) (0 


This completes the proof. 
We will use the notation 


(n + 1)! 

7 W) (®o) 


(*-* 0 r +i 

(x-®o) n+1 . 


□ 


r„(x) = /(x 0 ) + i:(+ £2l (*- + 


and call this the n t l Taylor polynomial of / expanded about xo . We will also use the notation 

w = (x- Xo r +i 


and call this the remainder term for the n th Taylor polynomial of / expanded about xq - 


Crumpet 3: £ 


£ is the (lower case) fourteenth letter of the Greek alphabet and is pronounced ksee. It is customary, but, of 

course, not necessary to use this letter for the unknown quantity in Taylor’s theorem. The capital version of £ is 

S, a symbol rarely seen in mathematics. 
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It will not be uncommon, for sake of brevity, to call T n (x) the n th Taylor polynomial and R n ( x) the remainder 
term when the function and center of expansion, xq, are either unspecified or clear from context. 

In calculus, you likely focused on the Taylor polynomial, or Taylor series, and did not pay much attention to the 
remainder term. The situation is quite the reverse in numerical analysis. Algorithmic error can often be ascertained 
by careful attention to the remainder term, making it more critical than the Taylor polynomial itself. The Taylor 
polynomial will, however, be used to derive certain methods, so won’t be entirely neglected. 

The most important thing to understand about the remainder term is that it tells us precisely how well T n {x ) 
approximates f{x). From Taylor’s theorem, f(x) = T n {x ) + R n (x), so the absolute error in using T n (x ) to approxi- 

/ (rt+1) (S) („_ ^n+l 


mate f(x) is given by \T n (x) - f(x)\ = \R n (x)\. But |i?„(x)| = 
Therefore, 


TF+ryr^-zo) 7 


for some £ between x and xq - 


\T n (x) - f(x)\ = \R n (x)\ < max 


f {n+1) (0 


(n + 1)! 


(■ x - x 0 ) 


n+1 


\x - x 0 r +i 

— — — max 

(n + 1)! e 


f (n+1) (0 


We learn several things from this observation: 

1. The remainder term is precisely the error in using T n {x) to approximate f(x). Hence, it is sometimes referred 
to as the error term. 


2. The absolute error in using T n (x) to approximate f(x) depends on three factors: 


(a) \x-x 0 \ n+1 

( b ) +W 

(c) \f (n+i) m 


3. We can find an upper bound on \T n (x) — f(x)\ by finding an upper bound on |/(”+ 1 )(^) 


Figure 1.2.1: For small n, T n (x) is a good approximation only for small x. 



Because |R„(a;)| measures exactly the absolute error | T n (x) — f{x)\, we will be interested in conditions that force 
|I? n (x)| to be small. According to observation 2, there are three quantities to consider. First, \x — xo|" +1 , or \x — xq\, 
the distance between x and Xq. The approximation T n (x) will generally be better for x closer to xq- Second, p^qyr- 
This suggests that the more terms we use in our Taylor polynomial (the greater n is), the better the approximation 
will be. Finally, |/ < ' II+1 H£)Ij the magnitude of the {n + l) st derivative of /. The tamer this derivative, the better 
T n {x) will approximate f{x). Be warned, however, these are just rules of thumb for making |i?„(x)| small. There 
are exceptions to these rules. 
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Figure 1.2.2: The actual error | T n (x) — f(x) | is often much smaller than the theoretical bound. 



To see these factors in action, consider f{x) = ln(i) expanded about xq = e 2 . According to Taylor’s theorem, 
T 2 {x) =2 + 


X ~ e2 (X - er ' and RM = T (l _ e y. 
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T n (x) = 2+J2 

i = i 


2e 4 

(— iy~ 1 {x — e 2 y 

je 2j 


3£ 3 


and Rn{x) = 


-1 


12£ 


12 


(x — e 2 ) 12 . 


After you have convinced yourself these formulas are correct, suppose that we are interested in approximating ln(cc) 
with an absolute error of no more than 0.1. Since |£ _3 | and |£ _12 | are decreasing functions of £, they attain their 
maximum values on a closed interval at the lower endpoint of that interval. Hence, for x > e 2 , we have |i?2(^)| < 


max £e[e 2 ,oi] 


M x -^ 3 


3i- 


(x - e 2 ) 3 


3^(x — e 2 ) 3 . But for 0 < x < e , we have |I?2(^)| < m a x {e[x,e 2 ] 

^(e 2 — a;) 3 . To determine where these remainders are less than 0.1, we need to solve the equations ^s(x— e 2 ) 3 = 0.1 

4.427. So 


12.33 and s = ^8ioo+iojgo-30 e 2 

13 v90 


and 3^3 (e 2 — a;) 3 = 0.1. The values we seek are x = ^1 + 

Taylor’s theorem guarantees that T 2 (x) will approximate ln(a;) to within 0.1 over the entire interval [4.427, 12.33]. 
Since e 2 ss 7.389, T 2 {x) approximates ln(a;) to within 0.1 from about 3 below e 2 to about 5 above e 2 . In other 
words, as long as x is close enough to Xo = e 2 , the approximation is good. A similar calculation for Rn{x) reveals 
that Tn (a:) is guaranteed to approximate ln(a;) to within 0.1 over the interval [3.667, 14.89]. In other words, for a 
larger value of n, x doesn’t need to be as close to xq to achieve the same accuracy. 

But remember, these are only theoretical bounds on the errors. The actual errors are often much smaller than 
the bounds. For example, our analysis gives the upper bound |T2 2 (3) | < y^3(e 2 — 3) 3 « 1.05 where the actual 


error, |T 2 (3) - ln(3) | = 


^-^-ln(3) 


.131. The bound is about 8 times the actual error. If we 


take this point a bit further, the graphs of T 2 (x) and Tn{x) versus ln(.T) (and a bit of calculation we will discuss 
later) reveal that T 2 (x) actually approximates ln(:r) to within 0.1 over the interval [3.296, 13.13] and Tn( x) actually 
approximates ln(x) to within 0.1 over the interval [0.9030, 15.33]. These intervals are a bit larger than the theoretical 
guaranteed intervals. See Figure 1.2.2. This figure reveals something else too. T 2 (18) does a much better job of 
approximating ln(18) than does Tn(18). It’s not always the case that more terms means a better approximation. 

We now turn our attention to perhaps the most often analyzed Taylor polynomials — those for the sine and cosine 
functions. They provide examples with beautiful visualization and simple analysis. The n th Taylor polynomial for 
f(x) = cos(a:) expanded about 0 is 


T„( x) = 


cos 


«» + £ 

i = i 


n, ( ej( cos (t>) 




, n , . cos(0) 2 

= cos(0) — sm(0) • x -x 


( x — O)- 7 


sin(O) ^3 + cos(O) 


= 1 - -x 2 


24 


rX 4 -. 


2 


,2 


6 


.3 


24 
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and its remainder term is 


R n {x ) 


£^t(cos(x)) 
x—j 

(n + 1)! 

f - sin(£) 

x n+1 I — cos(£) 

(n + 1)! I sin(£) 

Uos(£) 


(x - 0) n+1 

when n mod 4 = 0 
when n mod 4=1 
when n mod 4 = 2 
when n mod 4 = 3 


Since the sine and cosine functions are bounded between —1 and 1 we know that 


|cc | n+1 
(n + 1)! 


< Rn{x) < 


\x\ n+1 
(n + 1)! ’ 


There are two ways this remainder term will be small. First, if x is close to 0, then |a;| is small, making R n {x) 
small. Second, if n is large, then is small, making R n (x) small. In other words, for small values of n, the 

remainder term is small for small values of x. T n {x ) is a good approximation of cos(x) for such combinations of 
x and n. On the other hand, for large values of n, the remainder term is small even for large values of x. For 
example, |I?6i(*)| < so |i?6i(a ; )| w iH remain less than 1 for all x with magnitude less than \/62! « 23.933. 
Figures 1.2.1 and 1.2.3 illustrate these points. 


Figure 1.2.3: For large n, T n { x) is a good approximation even for large x. 



Key Concepts 

Rolle’s theorem: Suppose that f(x) is continuous on [a, 6] and differentiable on ( a,b ). If /(a) = /(&), then there 
exists £ G (a, b) such that /'(£) = 0. 

Taylor’s theorem: Suppose that f(x) has n+ 1 derivatives on (a, b), and xo € (a, b). Then for each x € (a, b), 
there exists £, depending on x, lying strictly between x and xq such that 


/ 0) = f(x o) + 

j'= i 


^ 7 b) 0 o) 


{x 



f {n+1 \0 

(n + 1)! 


{x 


x 0 ) n+1 


n th Taylor polynomial: T n ( x) = f(x 0 ) + Y%=i ■ 

Maclaurin polynomial: A Taylor polynomial expanded about xq = 0 is also called a Maclaurin polynomial. 
Remainder term: R n (x) = ^ n+: jjP (x — Xo) n+1 is precisely — (' T n {x ) — f(x)). 
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Error term: Another name for the remainder term. 


Crumpet 4: The original theorem of Brook Taylor 


The original theorem of Brook Taylor was published in his opus magnum Methodus Incrementorum Directa & 
Inversa of 1715 . In Methodus , it appears as the second corollary to Proposition VII Theorem III, bearing faint 
resemblance to any modern statement of the theorem. 


PROP. VIL THEOK. Ill 


Sint zr &' x qaontitates dun v mobiles, quorum z, uniformiter 


\ 

ougetur per dot* increment a z , & fit nz —v, v — z = v,^ 

• » • 

« n 

v — z—v,.- & fie porrb. Turn diet} quod quo tem- 
pore z crefcendo jit z + v, x item crefendo jiet 


i u 

X + * — 4 -x 


v VV . V V V , 

— , . 4- 8ec. 

If • VI-2.3Z5. ■ ■ 


C 0 R 0 L L. II. 


Si pro Incrementis evanefeentibus fcribantur fluxiones ipfis pro- 
portionales , fa&is jam omnibus t», v, v, v, v, &c. arqualibus 

quo tempore z uniformiter fluendo fitx+s fiet x, * + * ' 4 . 

ik 

.. v 1 , v 3 . , ' _ r . _ 

* — — + * - ■ ate. vel mutato figno lpfius v, quo tem- 
r. 2Z» I,a * 3* 


pore z decrelcendo fit x-ru, x decrefcendo fiet * 



I.2C* 


x 


.*• 2 - 3»3 


+ &c. 




PfeOP. 


There is no mention of a remainder term. There is no use of the familiar /(*)-type function notation. It’s written 
in Latin. And there is no laundry list of hypotheses. 

Here is the original statement of Taylor’s theorem in English as translated by Ian Bruce. Proposition VII. 
Theorem III: There are two variable quantities, z &z x, of which 2 is regularly increased by the given increment 

\ \ V 

2, and nz = v, v — z = v, v — 2 = v , and thus henceforth. Moreover, I say that in the time 2 increases to 2 + v, x 

\ \\\ 

increases likewise to become x + x + x l "V + x + &c. Corollary II: If for the evanescent increments, 

V \ 

the fluxions of the proportionals themselves are written, now with all the v , v , v, v, v , &c. equal to the time 
2 uniformly flows to become z + v, x becomes x + xfr + x ^ ^ 2 + x + & c ■ ■ ■ 
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Crumpet 5: Interpretation of the original theorem of Brook Taylor 


Unfortunately, the English translation of Taylor’s theorem is only moderately helpful to anyone who is not well 
acquainted with early 18 t,! century mathematics. In 1715, function notation was still 20 years in the making. 
Today, we would interpret the declaration of the two variables as declaring that a; is a function of 2 . The claim 

\ \\\ 

in Theorem III is that we can rewrite x(z + v) as x + + x ^ v " z2 + x + &c. Just as x should be 

interpreted as a function of 2 so should x, x, and x. More precisely, x means x(z + z) — x(z), the amount x is 
incremented as z is incremented by 2 . Likewise, x is the amount x is incremented as 2 is incremented by 2 , so 


x = x(z + z) — x(z) = x(z + 2z) — x(z + z) — x(z + z) — x(z) = x(z + 2z) — 2x(z + z) + x(z). Similarly, a; is 
the amount x is incremented as 2 is incremented by 2 . Now would be a good time to break from reading to verify 
that x = x(z+3z) — 3x(z+2z) + 3x(z+ z) — x(z), that x = x(z + Az) — 4:x(z+3z) + 6x(z+2z) — 4:x(z+z) + x(z), 

0 1 \ 2 \\ 

and so on. With this understanding and the conventions x for x, x for x, x for x, v for v, v for v, v for v , and 

o l • 2 

so on, it is then an algebraic exercise to see that 


x(z + nz) = 


ST 


n \ n n(n — l) n(n — l)(n — 2) 

/ I . ) x = x+x- +x V ; +x y Q ~ + 

\ ] ) j 0 1 1 2 1-2 3 1 - 2-3 

3= 0 


■ + X 


nz 


nz(n — 1)2 nz(n — 1 )z(n — 2 )z 


x + x- b x — - — — - 5 - 

0 1 I2 2 1 . 2 2 2 


+ x- 

3 


1 • 2 • 32 3 


+ X- 


n(n — 1) • • ■ 1 
1 ■ 2 • 3 ■ ■ • n 

nz{n — 1)2 • • ■ I 2 

1 • 2 • 3 • • ■ nz" 


1 

V vv 

X + X b X 


+ X- 


9 3 
VVV 


Ilz 2 1 . 2z 2 3 1 . 2 • 3 2 3 


+ X 


9 9 

VVV ■ 


• V 


1 • 2 • 3 • • • nz" 


This calculation is essentially Taylor’s proof of Theorem III. 

Corollary II (which we would consider the theorem) is not proved by Taylor beyond the “obvious” application 
of Newton’s theory of fluxions. In today’s language, corollary II follows by applying the limit as n — > 00 to the 

X 

expression from Theorem III. It makes for another nice exercise to verify that limn-^oo = x( k) (z), the k th 

/c 

derivative of x. And one final exercise to see that linin-^oo v = v. As Taylor took these results for granted, so 
shall we. Applying them to Theorem III, we see that x(z + v) = x(z ) + x'(z)jj + x"(z)^ 7 - + x"'(z)^ 7 - + ■ • ■ . In 
the notation of Taylor, - is the first derivative of x, -%■ is the second derivative of x, and so on. So we in fact 

2 ... 3 

have x + xfr + x + x 1-2 ^ 3ia + &c as claimed. 

It is interesting that Theorem III is true for any function x defined on the interval [x, x + v\. No matter if 
x is differentiable, or even continuous. It is a statement about finite differences. It is the corollary that requires 
many more assumptions because that is where we pass to the limit. 


Octave 

Two things that will come in handy time and again when using Octave are inline functions and .m files. Creating 
an inline function is a simple way to make a “custom” function in Octave. Creating a .m file is an organized way 
to execute a number of commands and save your work for later. 

In the last section we saw many built-in functions like sin(x), log(x), and abs(x). These have predefined 
meaning in Octave. But what if you want to define f(x) = 3a; 2 ? There is no built-in “3 x squared” function. That’s 
where an inline function is useful. The syntax for an inline function is 

name = inline (’ function definition’) 

where name is the name of the function and function definition is its formula. In the case of fix) = 3a; 2 , 
the Octave code looks like f=inline(’3*x~2’). Then you can use f the same way you would use sin or log or 
abs. Write the name of the function, left parenthesis, argument, right parenthesis. So, after defining f with the 
f =inline ( ’ 3*x~2 ’ ) statement, f (7) will result in 147: 
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octave :1> f=inline( ’3*x~2’ ) ; 
octave :2> f(7) 
ans = 147 

Now we may complete Experiment 1 of section 1.1 a fourth way. Instead of doing the computations on the 
command line, we can create a text file with the commands in it. Saved as a .m file, Octave will recognize it as a list 
of instructions. If you are familiar with programming, this way of working with Octave will come very naturally. 
Writing a .m file is the equivalent of writing a program. After it is written, it needs to be processed. On the Octave 
command line, a .m file is run by typing the name of the file, without the .m. That’s it, so it isn’t exactly like 
writing a program. There is no compiling. It’s a little bit more like scripting that way. 

To begin, use any text editor you like to create the list of commands. Note well, Microsoft Word, LibreOfRce, 
and other word processors are not text editors. They are word processors. They have font formatting features, 
page set up features, and so on. Now imagine your last report or letter to Mom and remove all the formatting, 
save separation of paragraphs. That’s a text file. No bold, no centering, no images, no special fonts, no margins, 
no pages. Just the typed words. There is no need for all the decorations a word processor allows. All Octave needs 
is a list of commands. The only formatting you will need is the line feed (new line) and tabs. If you don’t already 
have a favorite text editor (and maybe even if you do), you should use the one that comes with Octave. If you use 
this program, you will have no problems. So, first create the text document experiment 1 ,m exactly as shown here: 

format! ’ long’ ) 
pi = 10*pi-31 
p2 = 100*pl-41 
p3 = 100*p2-59 
p4 = 100*p3-26 
p5 = 100*p4-53 
p6 = 100*p5-58 
p7 = 100*p6-97 

Then, on the Octave command line, type experiment 1 to get the results: 

octave :1> experiment 1 
pi = 0.415926535897931 

p2 = 0.592653589793116 

p3 = 0.265358979311600 

p4 = 0.535897931159980 

p5 = 0.589793115997963 

p6 = 0.979311599796347 

p7 = 0.931159979634685 

This way of writing Octave commands has two distinct advantages. First, if you make errors, it’s a simple matter 
to correct them. Just edit the text file and save the changes. Second, you have a record of your work. You can 
share it, print it, or just save it for later. There is only one real disadvantage. It’s more involved than just executing 
a few commands on the command line. So, for simple computations, it is more headache than necessary. 

Note well that the .m file has to be saved in the same directory from which Octave was started. This type of 
detail will be taken care of for you if you use an IDE, but if you are using a command line and text editor, you 
need to be sure .m files are saved to the proper location. 


Exercises 


Find 

T 3 (x] 

) and .fib (a:) for the function expanded about 

x 0 - 



(a) 

/(*) 

= sin(a;); xo = 0. ^ 

(b) 

/(*) 

= sin(a;); xo = 7r/ 2. 

(c) 

/(*) 

= sin(a;); xo = iv. ^ 

(d) 

/(*) 

o 

II 

o 

B 

9 

0) 

II 

(e) 

/(*) 

= e x \ xo = In 2. 

(f) 

/(*) 

= a:sin(x); a;o = 0. ^ 

(g) 

/(*) 

= cos 2 (a;); xo = 0. 


2. Let f(x ) = 4a; 3 — 2a; 2 + 8a; — 9. 

(a) Find T 3 (x) and R 3 (x) expanded about a;o = 0. 

(b) Find T 3 (x) and -R3 (a;) expanded about a:o = 2. 

(c) Make a conjecture based on your answers to parts 
(a) and (b). Can you prove it? 

3. Find the 36 th Maclaurin Polynomial for f(x) = e x . 

4. Suppose f(x) is a function whose fourth derivative ex- 
ists on the whole real line, (— oo, oo), and that /( 2) = 3, 
/'( 2) = -1, /"( 2) = 2, and /"'( 2) = -1. 

(a) Write down the third Taylor polynomial for f(x) 
expanded about xo = 2. 
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(b) Use the Taylor polynomial to approximate /( 4). 

(c) Find a bound on the absolute error of the approx- 
imation using the fact that 

-3</ (4) (£) <5 


for all f G [2,4]. 


5. Compute the 3 rd Taylor Polynomial for f(x) = x 5 — 
2a: 4 + x 3 — 9x 2 + x — 1 expanded about *0 = 1. 

6. Find the second Taylor Polynomial for f(x) = esc a; ex- 

7T 

panded about xo = — . Here are some facts you may 
find useful: 


f'(x) = — csc(x) cot(x) csc(*) = 
f"(x) = csc(*)(l + 2 cot 2 (*)) cot(*) 


1 

sin (a;) 

_ cos(x) 
sin(x) 


7. The hyperbolic sine, sinh(a;), and hyperbolic cosine, 
cosh(:r), are derivatives of one another. That is, 

-^-(sinhf*)) = cosh(*) 
da: 

and 

-^-(cosh(*)) = sinh(a:). 

Find the remainder term, P43, associated with the 43 rd 
Maclaurin polynomial for f[x) = cosh(*). 

8. m = Use an inline function to evaluate the Taylor poly- 
nomial Tn(x) = 1 — | a; 2 + T x 4 at the given value of x. 

[S] 


(a) 0 

(b) | 

(c) 1 

(d) 7T 

9. " e Use an inline function to evaluate the Taylor poly- 
nomial Tz{x) = 1 + x + \x 2 + | a: 3 at the given value of 

x. 

(a) 0 

(b) f 

(c) 2 

(d) e [A1 

10. J < Write and run a .m file that finds all the answers 
for exercise 8. ^ 

11. 0 o Write and run a .m file that finds all the answers 
for exercise 9. 

12. Find £(*) as guaranteed by Taylor’s theorem in the fol- 
lowing situation. 

(a) f(x) = cos(x), xo = 0, n = 3, x = tt. ^ 

(b) /(*) = e x , xo = 0, n = 3, x = In 4. 

(c) f(x) = ln(a:), *0 = 1, n = 4, x = 2. 


(a) Find the second Taylor polynomial, P 2 (x), about 
xo = 0. 

(b) Find the remainder term, J?2(0.5), and the actual 
error in using P2(0.5) to approximate /( 0.5). 

(c) Repeat part (a) using *0 = 1. 

(d) Repeat part (b) using the polynomial from part 
(c). 

14. Find the second Taylor polynomial, Pi^x), for f(x) = 
e x cos x about xo = 0. 

(a) Use P2(0.5) to approximate /( 0.5). Find an up- 
per bound on the error | y (0.5) — P2(0.5)| using 
the remainder term and compare it to the actual 
error. 

(b) Find a bound on the error \f(x) — P 2 {x)\ good on 
the interval [0,1]. 

(c) Approximate fg f(x) dx by calculating 
f 0 P 2 {x) dx instead. 

(d) Find an upper bound for the error in (c) using 
f 0 \R, 2 (x) \ dx and compare the bound to the ac- 
tual error. 

15. Let f{x) = e x . 

(a) Find the n th Maclaurin polynomial P n (x) for 

/(*)■ 

(b) Find a bound on the error in using Pi (2) to ap- 
proximate /( 2). 

(c) How many terms of the Maclaurin polynomial 
would you need to use in order to approximate 
/( 2) to within 10~ 10 ? In other words, for what n 
does P n (2) have an error bound less than or equal 
to 10' 10 ? 


16. Find the fourth Taylor Polynomial for In* expanded 
about *o = L 

17. What is the 50 th term of Tioo(e x ) expanded about 
*0 = 6? 

18. The Maclaurin series for the arctangent function con- 
verges for — 1 < x < 1 and is given by 

°°' 2i-l 

arctan* = lim P„(x) = lim > (— 1) I+1 — . 

n — loo n—too J 2 % — 1 

i=n -\- 1 

Use the fact that tan(7r/4) = 1 to determine the num- 
ber of terms, n, of the series that need to be summed 
to ensure that |4P„(1) — 7r| < 1CU 3 . 

19. Exercise 18 details a rather inefficient means of ob- 
taining an approximation to 7r. The method can 
be improved substantially by observing that 7t/4 = 
arctan | + arctan | and evaluating the series for the 
arctangent at \ and at |. Determine the number of 
terms that must be summed to ensure an approxima- 
tion to 7r within 10“ 3 . 

20. For /(*) = tan _1 (*), 


f (n \ 0) 


0 

(_l)(«-l)/2( n 


if n is even 
1)! if n is odd. 


13. Let /(*) = x 3 . 


Find the n th Maclaurin polynomial P„(x) for /. 
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21. How many terms of the Maclaurin Series of sin* are 
needed to guarantee an approximation with error no 
more than 10~ 2 for any value of x between 0 and 27t? 

22. Suppose you are approximating /(*) = e x using the 
tenth Maclaurin polynomial. Find the largest interval 
over which the approximation is guaranteed to be ac- 
curate to within 10 -3 . 

23. Find a bound on the error in approximating e 10 by 
using the twenty-fifth Taylor polynomial of g(x) = e x 
expanded about xo = 0 . 

24. Find a bound on the error of the approximation 

e 2 * 1 + 2 + l ( 2) 2 + ^( 2) 3 + ^( 2) 4 + ^( 2) 5 

according to Taylor’s Theorem. Compare this bound 
to the actual error. 

25. Suppose /^(x) = e^cosx for some function /. Find 
a bound on the error in approximating /(*) over the 
interval [0,7t/2] using Ty(x) expanded about xo = 0. 

26. Let /(*) = 1, and xo = 5. b) 

(a) Find T^x). 

(b) Find R 2 {x). 

(c) Use T 2 (x) to approximate /( 1) and /( 9). 

(d) Find a theoretical upper bound on the absolute 
error of each of the approximations in part (c). 

(e) Find a theoretical lower bound on the absolute 
error of each of the approximations in part (c). 

(f) Find the actual absolute error for each of the ap- 
proximations in part (c). Verify that they are 
indeed between the theoretical bounds. 

(g) Sketch graphs of /(*) and T 2 (x) on the same set 
of axes for x £ [1,9]. 


(g) Sketch graphs of /(*) and T 2 (x) on the same set 
of axes for x £ [ 1 , 26). 

28. Suppose /(*) is such that —3 < Z*- 10 - 1 ]*) < 7 for all 
x £ [0, 10]. Find lower and upper bounds on the ab- 
solute error in using Tg(x) expanded about *o = 3 to 
approximate 

(a) /( 0). 

(b) /( 10 ). 

29. Suppose you wish to approximate the value of — e 4 sin 4 
using separate Maclaurin polynomials (Taylor polyno- 
mials expanded about xo = 0 ) for the sine and exponen- 
tial functions instead of a single Maclaurin polynomial 
for the function /(*) = — e x sin*. How many terms of 
each would you need in order to get accuracy within 
10 - 2 °? ig nore r ound-off error. 

30. Find a theoretical upper bound, as a function of x, for 
the absolute error in using Tj(*) to approximate /(*). 

(a) e x sin*; *o = 0 . 

(b) e~ x2 -,x o = 0. [s > 

(c) IS _|_ sin( 10 x); *o = 7 r. 

31. The Maclaurin Series for /(*) = e~ x is 

( — 1 )* i , i 1 2 1 3 , 

i\ 2 6 

i=0 

Find a bound on the error in approximating 1/e by 
1-1 + 1/2-1/6 + 1/24. 

32. The Taylor series for /(*) = e x is 

t -,/\ , , .12.13.14,15, 

T(x) = l + x+-x + -* +-x + -* +•••. 

This series converges to /(*) for all values of *. In 
particular, for * = 1 , this means that 



27. Let /(*) = ln(l + *) and *0 = 0. 

(a) Find Ti{x). 

(b) Find R 3 (x). 

(c) Use T 3 (*) to approximate /( 1) and /( 26). 

(d) Find a theoretical upper bound on the absolute 
error of each of the approximations in part (c). 

(e) Find a theoretical lower bound on the absolute 
error of each of the approximations in part (c). 

(f) Find the actual absolute error for each of the ap- 
proximations in part (c). Verify that they are 
indeed between the theoretical bounds. 


/a) = n i) = 1 + 1 + ^(i) 2 + |(i) 3 + ]f(i) 4 + ■ • ■ 

Simplifying this equation, we see that 


, 111 1 
6_ 1 + 1 + 2 + 6 + 24 + 120 + 


Use Taylor Series to find infinite sums that sum to 


(a) ln( 2 ) 

(b) 2/3 

(c) 7 t /4 

(d) 72 
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1.3 Speed 

Besides accuracy, there is nothing more important about a numerical method than speed. There is almost always a 
trade-off between one and the other, however. Fast computations are often not particularly accurate, and accurate 
calculations are often not particularly fast. There are certain algorithms that produce accurate results quickly, 
however. Deriving them, or identifying them once derived is what numerical analysis is all about. 

The first type of numerical method we will encounter produces a sequence of approximations that, when ev- 
erything is working, approach some desired value, say p. With these methods, we will get a sequence ( p n ) with 
linin-^oo p n = p. You should be familiar with the concept of the limit of a sequence from Calculus, but the purpose 
there was much different from ours here. Generally, you were concerned with whether a given sequence converged 
at all. And when it did converge, and you were very lucky, you were able to determine the limit. In numerical 
analysis, we know certain sequences converge, and are only interested in how quickly they do so. 

Simple observation (and a little common sense) can tell you which cars on a highway are traveling faster than 
which. Simple observation (and a little common sense) will also often tell you which sequences converge faster 
than which. Consider the sequences in Table 1.4 which all converge to e « 2.71828182845904. (t n ) is accurate 


Table 1.4: Some sequences that converge to e. 


n 

T 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 


On 

3 

2.9436563656918 

2.89858145824525 

2.86252153228801 

2.83367359152222 

2.81059523890958 

2.79213255681947 

2.77736241114739 

2.76554629460972 

2.75609340137958 

2.74853108679547 


rn 

3 

2.86799618929986 

2.78315514435127 

2.73974041668143 

2.72324781752852 

2.71899828870116 

2.71833715075158 

2.71828369688657 

2.71828184959225 

2.71828182851528 

2.71828182845907 


Sn 

3 

2.82129001274358 

2.73850656616954 

2.71973377603211 

2.71830229432561 

2.71828184916891 

2.71828182845934 

2.71828182845904 

2.71828182845904 

2.71828182845904 

2.71828182845904 


3 

2.78177393100014 

2.72150682612711 

2.71829014894701 

2.71828182851442 

2.71828182845904 

2.71828182845904 

2.71828182845904 

2.71828182845904 

2.71828182845904 

2.71828182845904 


to 15 significant digits by the sixth term; (s n ) is accurate to 15 significant digits by the eighth term; (r n ) is still 
not accurate to 15 significant digits by the eleventh term, but seems likely to gain 15 significant digits of accuracy 
on the twelfth term; and (q n ) is only accurate to 2 significant digits by the eleventh term, so seems likely to take 
considerably more than twelve terms to gain 15 significant digits of accuracy. Since they all started at 3, it seems 
reasonable to say that, ordered from fastest to slowest, they are (t n ), (s n ), (r n ), (q n ). And that is correct as we will 
see soon. But just like knowing which cars are faster than which is different from knowing how fast each is going, 
knowing which sequences converge faster than which is different from knowing how quickly each one converges. To 
measure the speed of a given car, you need access to its speedometer or a radar gun. To measure the order of 
convergence (speed) of a sequence, you need a definition and a little algebra. 


Order of convergence of a sequence 

Suppose the sequence (p n ) converges to p. Then we say (p n ) converges with order a > 1 if 


lim 

n—¥oo 


\Pn+l ~P | 
I Pn - P\ a 


= A 


for some real number A > 0. 

Let’s see how to use this definition to calculate the orders of convergence of the sequences in Table 1.4. According 
to the definition, a, should it exist, gives the speed (or order) of convergence of a sequence. Now assuming that a 
does exist, we have that lim^oo = A, so for large enough n, 

\Pn + 1 ~P\ _ \Pn + 2 ~ P\ 

\Pn ~P\ a ~ \Pn + 1 ~P\ a 


A. 
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In particular, we can solve for a to find a 


In 

Pn + 2-p 

Pn + 1-P 

In 

Pn + I-P 

Pn-P 


Crumpet 6: Order of Convergence Less than or equal to 1? 


p n _i_ ^ — p 

There is no such thing as an order of convergence less than one because if lim^-x^o | p _ p |a = A for some 
0 < a < 1, then 


lim 

n — >oo 


|Pn+l ~P | 
I Pn - P\ 


l- |P— + 1 P\ I ICK — 1 

= hm i J- • Ip- - p| 

TI-K30 \p n ~ P\ a 


a contradiction. On the one hand, the ratio test implies that lim n ^oo exists and is less than or equal to 1. 

On the other hand, a < 1 =£■ a — 1 < 0 so for \p n —p\ small, \p n — p| a_1 is large. Hence, lim,!-^ ■ \p n — 

p|“ -1 does not exist. To be rigorous, let M be any real number. Then there exists an N\ such that n > N\ implies 

^vn-pV* > 0.9A. There also exists N 2 such that n > N 2 implies \p n — p| < , so | p n — p|“ -1 > 

Letting N = max{iVi, N 2 } we have that n > N implies both > 0.9A and | p„ — p|“ _1 > jffy- Hence, for 

n > N, we have 


|p-+i ~P| 
|Pn - Pi 


\Pn + l ~ P\ 
I Pn ~P|“ 


IP— Pi C 


M 

>0 - 9 V (L9A= M - 


Therefore, limn-^oo does not exist. When a = 1, it must be that A < 1 because otherwise the ratio test 

implies that (| p n — p|) diverges, and, therefore, (p n ) diverges. 


For example, 


In 

<? 2 -e 

qi-e 

In 

qi-e 

qo-e 


In 

2.8985— e 

2.9436— e 

In 

2.9436— e 

3— e 


1 and 


In 

qio-e 

<?9 -e 


In 

99 -e 

98 -e 


In 

2.7485 — e 

2.7560— e 

In 

2.7560— e 

2.7655 — e 


1. And if we try other sets of three 


consecutive terms of ( q n ), we get the same results. The order of convergence of ( q n ) is about 1. Of course, we would 
need a formula for \q n — e\ to determine whether the limit were truly 1, but we have some evidence. Repeating 
the calculations for (r n ), (s„), and ( t n ), we get approximate orders of convergence 1.322, 1.618, and 2, respectively. 
Again we see that, ordered from fastest to slowest, they are (f„), (s n ), (r n ), ( q n ). 

If you attempted to calculate the orders of convergence yourself, you may have noticed that more information is 
needed to use s n with n > 6 or t n with n > 4. All of these terms in the table are equal, so the formula for a fails to 
produce a real number! A more useful table for calculating orders of convergence is one listing absolute errors: In 


Table 1.5: Absolute errors. 


n 

\q n - e| 

\r n ~ e| 

|fin - e| 

1 t n - e| 

0 

2.817(10)- i 

2.817(10)- 1 

2.817(10)- i 

2.817(10)- i 

1 

2.253(10)- 1 

1.497(10)- 1 

l.os(io)- 1 

6.349(10)- 2 

2 

1.802(10)- 1 

6.487(10)- 2 

2.022(10)" 2 

3.224(10)" 3 

3 

1.442(10) _1 

2.145(10)" 2 

1.451(10)" 3 

8.32(10)- 6 

4 

i.iss^io)- 1 

4.965(10)- 3 

2.046(10)- 5 

5.538(10)- n 

5 

9.231(10)- 2 

7.164(10)- 4 

2.07(10)-® 

2.453(10)- 21 

6 

7.385(10)- 2 

5.532(10)- 5 

2.953(10)- 13 

4.817(10)- 42 

7 

5.908(10)- 2 

1.868(10)- 6 

4.263(10)- 21 

1.856(10)-® 3 

8 

4.726(10)- 2 

2.113(10)-® 

8.777(10)- 34 

2.757(10)- 166 

9 

3.781(10)- 2 

5.623(10)- n 

2.608(10)- 54 

6.084(10)" 332 

10 

3.024(10)- 2 

2.22(10)" 14 

1.595(10)-® 7 

2.961(10)- 663 


addition to making it easier to calculate a, this chart makes it painfully obvious that our common sense conclusion 
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about which sequences converge faster than which was quite right. Just compare the accuracy (absolute errors) of 
the eleventh terms. 

So now we can calculate orders of convergence, but what does it all mean? What does the order of convergence 
tell us about successive terms in the sequence? Solving the approximation ^p + l p ]a ~ A gives us that \p n +i — p\ ~ 
A| p n — p\ a ■ So, roughly speaking, convergence of order a means that, for large enough n, the approximation p n +i 
is about X\p n — pl"^ 1 times closer to the limit p than is p n . To rephrase in terms of significant digits of accuracy, a 
little bit of algebra: 


\Pn + 1 ^ P\ 
Pn + 1 - P 


-log 


P 

Pn+l - P 
p 

d(Pn+ 1 ) 


A|Pn -P r 
Pn ~P 


A 


P 


-log 


br 1 

- log (A|p| a_1 ) 


Pn P 
P 

ad(p n ) - log (A|p| a_1 ) . 


Based on this calculation, we conclude these rules of thumb: 


1. for linear convergence ( a = 1), d(p n+ 1) « d(p n ) — log A, so each term has a fixed number more significant 
digits of accuracy (approximately equal to — log A) than the previous; 

2. for quadratic convergence ( a = 2), d(p n+ 1) « 2 d(p n ) — log(A|p|), so each term has double the number of 
significant digits of accuracy of the previous, give or take some; 

3. for cubic convergence (a = 3), d{p n +\ ) « 3d(p„) — log (A|p| 2 ) , so each term has triple the number of significant 
digits of accuracy of the previous, give or take some; 


and so on. Summarizing, for large n, you can expect that each term will have — log (A|p|“ 1 ) more than a times 
as many significant digits of accuracy as the previous term. We can see this claim in action by calculating A for 
the sequences ( t n ), (s n ), (r n ), and ( q n ). Using the fact that A « , we find that A = 0.8 for each sequence. 

Therefore, (q n ) should show each term having — log 0.8 ss .1 more significant digits of accuracy than the previous. 
More sensibly, this means the sequence will show about one more significant digit of accuracy every ten terms. 
This is borne out by observing that go has error about 3(10) -1 while gio has error about 3(10) -2 . For (r n ), we 
should expect each term to have about — log(0.8 • e' 322 ) « —0.04 more than 1.322 times as many significant digits 


of accuracy as the previous. For example, r 3 has about log 
r4 has about 1.322(2.1) — .04 


2.1 significant digits of accuracy while 
3.57 significant digits 


| 2.145(10)- 

2.73 significant digits of accuracy, r 5 has 1.322(2.73) — .04 : 
of accuracy, and so on until rs has about 8.1 significant digits of accuracy. Again this is borne out by the table 
as log 


e 

1 

e 

rs — e 

— log 

2.113(10) -8 


8.1. Though we can do a similar calculation for (t n ), it’s easier just to eyeball it 
since all we need to see is that the exponent in the scientific notation doubles, give or take a little, from one term 
to the next. Indeed it does as it goes from 1 to 2 to 3 to 6 to 11, and so on. 

Note that in all this analysis, we have ignored the requirement that n be “large”. That was acceptable in this 
case since these sequences were contrived so that even n = 0 was large enough! In practical applications this will 
not be the case. 

To appreciate just how much faster one order of convergence is over another, consider the relation 


d(p n + 1) « ad(p n ) - log (A|p|“ : ) 

again. Now suppose we know that d(p no ) = d no for some particular ?r 0 large enough that the approximation is 
reasonable. Then it can be shown that, for a > 1, 


d(p no+k ) » (d no - C)a k + C 


where C 




a — 1 
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Crumpet 7: Solving a Recurrence Relation 


The relation d{jp n + i) « ad(p n ) — log (A|p|“ is an example of a recurrence relation. In particular, a first order 
linear nonhomogeneous recurrence relation with constant coefficients since it has the form 

dn+ 1 — h\(X n T &2 


where fci and k 2 are constants. Linear nonhomogeneous recurrence relations can be solved by summing a homo- 
geneous solution and a particular solution. For the particular solution, we seek a solution of the form a n = A 
(for all n) by substituting this assumed solution into the recurrence relation. Doing so gives A = hi A + k 2 , so 
A = 1 _ 2 kl is such a solution. For the homogeneous solution, we seek a sequence of the form a„ = r n that satisfies 
a n +\ = kia n + 0. Substituting our assumed solution into the modified (homogeneous) recurrence relation gives 
r Tl+1 = kir n . Rearranging, r n (r — fci) = 0 so r = 0 or r = ki. Notice that Bk " is also a solution for any constant 
B. This includes the solution a„ = 0 which would arise from setting r = 0. Finally, putting the particular and 
homogeneous solutions together, the solution of a n +i = kia n + k 2 is a n = Bk " + 1 _ 2 k for any constant B. In 


the case of d(p„+i) « ad(p n ) — log (A|p|“ J ), ki = a and k 2 = —log (A|p|“ *) so d(p„) = Ba n + log (^ l i — 1. 
The value of B is determined by substituting any known element of the sequence into this formula and solving 


for B. Supposing d(p„ 0 ) = d no yields d(p„) = (d no - 


l°g(A|p|° 


„ n I 

OL + 


log(A|p|° 


The important thing to see here is that d{p nQ j r k) is an exponential function when a > 1. The number of significant 
digits of accuracy grows exponentially with base a. As we saw before, for a = 1, the number of significant digits 
grows linearly. In calculus you learned that any exponential function grows much faster than any polynomial 
function, so it is reasonable and correct to conclude that sequences converging with orders greater than 1 are 
markedly faster converging than are sequences converging with linear (a = 1) order. 

But be careful. Based on this same memory of calculus, you would also conclude that the sequence (2 -n ) 
converges to 0 much faster than does ( n ~ 2 ). By some measures, that’s true, but not by all measures. Consider the 
orders of convergence of these two sequences. We seek values a\ and a.i such that 

|2-(«+i) _Q| 


lim 


oo \2~ n — 0| ai 


= Ai 


and 


lim 

n— Kx> 


(n + 1) 2 - 0| 
| n ~ 2 - 0|“ 2 


— Aq 


for some real numbers Ai and A 2 . A little bit of algebra will lead to solutions: 

| 2 -("+ 1 )- 0 | 2~ n ~ 1 


\2~ n - 0|“! 


2-ain 

while 


_ 2(01-1)™-! 


\(n + l) -2 — 0| 


, 2(»2 


i-2 


-0^ 


z 2 + 2n + 1 


will be a 


The only way linin^oo 2(“ 1 ^ 1 ) ra ~ 1 will be a nonzero constant is if a\ = 1. The only way linin^oo w2 ” o, i+ i 
nonzero constant is if the leading coefficients of the numerator and denominator are equal. That means a .2 must be 
1 as well. So (2 _n ) and ( n ~ 2 ) both converge to zero with linear order. They are equally extremely slow to converge 


by this measure! Still, something should not feel quite right about claiming that (2 
same speed. 


and (n 2 ) converge at the 


Rate of Convergence of a Sequence 

For sequences that converge with linear order, we need a finer measure than order to determine which is faster than 
which. Recall from calculus, 

„2 


9 —n 

lim 

n—¥ 00 Ti 


= lim 


= lim 

n—tcx 

= lim 


2 n 


n —> 00 2™ In 2 
2 


2" (In 2) 2 


= 0, 
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indicating that (2 n } approaches 0 much faster than does (n 2 ). You may also recall comparisons between power 
functions: 

, n~ p 

lim _ = 0 

n— >• oo 77, 9 

whenever p > q > 0; and between exponential functions: 


lim — — 

n—t oo o n 


= o 


whenever a > b > 1; and between the two: 

-n 

lim — = 0 

n— yoo n 1 

whenever a > 1. In other words, sequences of the form (^) converge to zero faster than sequences of the form 
(np) whenever a > 1. The sequence (^pr) converges to zero faster than (;^r) whenever a > b > 1. The sequence 
(^) converges to zero faster than (-A-) whenever p > q > 0. Not all functions are as simple as these, but we can 
use these as our yard sticks. Suppose ( p n ) converges to p , ( b n ) converges to 0 and \p n — p\ < A|6„| for some constant 
A and all sufficiently large n. Then we say that ( p n ) converges to p with rate of convergence 0(b n ), read “big-oh of 
bn". Since we are familiar with sequences of the forms (yrr) for some constant a > 1 and (^r) for some constant 
p > 0, and they are simple enough, typically ( b n ) will be one of them. For example, ( 1 ) converges to and 


2n + 1 1 _ 1 11 

4n 2 4n — 4 n 


so ( ) converges with rate O(^). We may also say that = \ + O(A) to convey exactly the same message. 

Normally, when we find a rate of convergence, we try to find the fastest converging sequence from our stock of 
simple examples that satisfies the definition. In this case, there is none faster. 

Basically all the sequences studied in any depth in calculus converge with linear order. So what does it take to 
converge with a higher order? Let’s have a look at (e -2 ). 


lim 

n— Kx> 



lim — 

n— Kx> e 


-2-2 n 


-a 2 n 


= 1 


when a = 2. So ( ) is quadratically convergent. Essentially, it takes an exponentially growing exponent to 
converge with an order greater than 1. 


Crumpet 8: Approximating 7r 


The sequence 

1103 -2 3/2 1130173253125 1029347477390786609545 

9801 ’ 313826716467- 2 7 /2’ 1116521080257783321 ■ 2 23 /2 ’ ’ ' ' 

converges to A . Its terms are given by the formula 


\/8_ ^ (4j)!(1103 + 26390j) \ 


9801 


E 

1=0 


0'!) 4 ■ 396 4 l j 


71=0,1,2,3,.. 


of Srinivasa Ramanujan. For all practical purposes, it converges very quickly. The first term already has about 
8 significant digits of accuracy: 


1103 • 2 3/2 
9801 

1 


7T 


0.31 8309878440470 12321768445317 
0.31830988618379067153776752674, 


and the second has about 16: 


1130173253125 


: 6.48(10) 


313826716467 • 2 7 / 2 n 

double the accuracy of the first term. The third term is already more than double-precision accurate. 


It’s tempting to believe, or hope, the sequence is quadratically convergent, but it is not. The third term has 
an accuracy of about 24 significant digits. Each term in the sequence is approximately 8 significant digits more 
accurate than the previous — the hallmark of a linearly convergent sequence. 
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Key Concepts 

Order of convergence: The sequence (p n ) converges to p with order of convergence a > 1 if 

\Pn+l~P\ > 

Inn jo- = A 

n-S-oo \p n - p | 

for some real number A > 0. 

Absolute error: For a sequence (p n ) that converges to p with order a, the absolute errors of consecutive terms 
are related by the approximation 

\p n +i -p\ « A| p n -p\ a 

for large enough n. 

Significant digits of accuracy: For a sequence (p n ) that converges to p with order a, the numbers of significant 
digits of accuracy of consecutive terms are related by the approximation 

d(p n + i) « ad(p n ) - log (A|p| Q_1 ) 


for large enough n. In closed form (for a yf 1) 


d(p n+ k) = (dn - c)a k + C 


where C 


(Abr 1 ) 


Rate of convergence: The sequence (p n ) converges to p with rate of convergence 0(b n ) if (b n ) converges to 0 and 

\Pn -p\< A|6„| 


for some constant A and all sufficiently large n. 


Octave 

An invaluable tool in any kind of programming is looping. When you need to perform some procedure multiple 
times for varying input, a loop is probably the right solution. While there are several types of loops available in 
Octave, we will discuss only for loops right now. The idea is to have a variable, sometimes called a counter, that 
counts how many times the procedure has been performed. When the procedure has been performed the desired 
number of times, the looping ends, and the program continues from there. You almost certainly encountered this 
idea before you ever wrote a computer program. If you ever went to the fair and paid a dollar to toss a dozen rings 
in hopes of landing one on the neck of a soda bottle, you have experienced looping. You may have even counted 
the rings as you tossed them. You were the counter! You had to perform the procedure of throwing a ring into 
the field of bottles 12 times. So, perhaps you threw one and counted to yourself “1”. Then you threw another and 
counted “2”. And another and counted “3”. And so on through “12”. When the last ring was tossed, you continued 
about your day at the fair. 

The for loop is an abstract analogy of this situation. Suppose you want to calculate 1!, 2!, 3!, and so on through 
12!. In Octave, you could create the following .m file and run it. 

factorial (1) 
factorial (2) 
factorial (3) 
factorial (4) 
factorial (5) 
factorial (6) 
factorial (7) 
factorial (8) 
factorial (9) 
factorial (10) 
factorial (11) 
factorial (12) 
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But this can be tedious and not particularly reader-friendly, especially if we are interested in doing some computation 
many more than 12 times. The purpose of the loop is to reduce the repetitiveness of this approach. We want to 
perform the procedure of calculating the factorial of 12 different integers, so a loop is appropriate. The syntax for 
the loop is to set up the counter, write the code to perform the procedure, and mark the end of the loop. It looks 
something like this. 

for j=f irst : last 
do something. 
end“/ 0 f or 


This will cause Octave to perform the procedure once for each integer from first to last, including both first 
and last. The value of the counter, j in this case, may be used in the procedure. So to calculate 1! through 12!, 
we might write 

for j=l : 12 
factorial (j ) 
end“/ 0 f or 


This will produce exactly the same output as the program with one line for each factorial. And if later you want 
to calculate 1! through 20! instead, all you have to do is change the 12 to a 20. The for loop is your friend! 

Now suppose we want to calculate a for each set of three consecutive values of \s n — e| from Table 1.5. Since 
there are 9 such sets, we need to create a loop that will run through 9 times. And inside the loop, we will need to 


perform the calculation a 


In 

s-n+ 2 -e 

Sn+1-e 

In 

Sn+1—e 

s n —e 


But before we can start, we need to tell Octave about the 11 values from 


the table. The most convenient way to do so is in an array. An array is like a vector. It has components. In this 
case, each component will hold one value from the table. And the syntax for creating the array is a lot like vector 
notation. We will use square brackets to delimit the components of the array, and we will separate the components 
by commas. So, the first line of our Octave code will look like this. 


errs = [2 . 817*1CT (-1) , 1 . 03*10'(-1) , 2 . 022*10' (-2) , 1 . 451*10' (-3) , . 

2 . 046*10' (-5) , 2.07*10'(-8) , 2 . 953*10' (-13) , 4 . 263*10' (-21) , 
8. 777*10' (-34) , 2. 608*10' (-54) , 1.595*10'(-87)] 


The ellipses (three consecutive dots) at the ends of the first two lines are needed to tell Octave that the command 
continues onto the next line. Without them, separating a single command over multiple lines will cause a syntax 
error. Starting a new line in Octave is the signal to start a new command as well. 

Now Octave knows the values of \s n — e|. Using this vector is a lot like using subscripts. The first value, 
2.817(10) _1 , is called errs(l). The second is called errs (2). The third is called errs (3) , and so on. The length 
of the array errs can be retrieved using the lengthO function of Octave. The command length(errs)-2 will be 
used instead of hard-coding the 9. So we can finish the Octave code like so. 


errs = [2 . 817*10' (-1) , 1 . 03*10'(-1) , 2 . 022*10' (-2) , 1 . 451*10' (-3) , ... 

2 . 046*10' (-5) , 2.07*10'(-8) , 2 . 953*10' (-13) , 4 . 263*10' (-21) , ... 
8. 777*10' (-34) , 2. 608*10' (-54) , 1.595*10'(-87)] ; 
for j=l : length(errs)-2 

alpha = log(errs(j+2)/errs(j+l))/log(errs(j+l)/errs(j)) 
end%f or 


This code produces these results: 

alpha = 1.6182 

alpha = 1.6181 

alpha = 1.6176 

alpha = 1.6182 

alpha = 1.6180 

alpha = 1.6180 

alpha = 1.6180 

alpha = 1.6180 

alpha = 1.6180 
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Not bad, but we can do better. Let’s calculate a , A, and d(s n ) by two different methods — directly and using the 
formula d(p n+ 1 ) « ad(p n ) — log (A|p| Q_1 ). Then let’s display the results in a nicely formatted table. 

We will need the dispO command and a two-index array. The dispO command is used to display some text 
or some quantity. When used for text, the text needs to be delimited by single quotation marks. When used for 
quantities, not. So, we might have an Octave program output the word “hello” with the command disp( ’hello 1 ) 
or have it output the value of ln(2) with the command disp(log(2) ) . The dispO command can also handle 
variables, so if pi and p2 have been assigned values, then we can display their difference using disp(p2-pl). A 
two-index array can be thought of as a table, or a matrix. It holds values in what can be imagined as rows and 
columns. So, instead of having errs(j) as we did before, we may have errs(j ,k) where j indicates the row and 
k indicates the column. The program 

A(2,4) = 7; 
disp(A) ; 

produces 

0 0 0 0 

0 0 0 7 

OK, back to the task at hand. We will combine everything we have learned about Octave into one program. 

errs = [2 . 817*10“ (-1) , 1 . 03*10“ (-1) , 2 . 022*10“ (-2) , 1 ,451*10“(-3) , ... 

2.046*10“(-5) , 2.07*10“(-8) , 2.953*10“(-13) , 4.263*10“(-21) , ... 

8.777*10“(-34) , 2.608*10“(-54) , 1 . 595*10“ (-87)] ; 

d = inline(’-log(x/exp(l))/log(10) ’) ; 
for j =1 : 9 

7o alpha: 

T(j,l) = log(errs(j+2)/errs(j+l))/log(errs(j+l)/errs(j)) ; 

7. lambda : 

T(j ,2) = errs(j+2)/errs(j+l)“T(j ,1) ; 

7o d (explicit) : 

T(j ,3) = d(errs(j+2)) ; 
end 


alpha = 1.61804; 
lambda = 0.8; 

constant = log(lambda*exp(alpha-l) )/log(10) ; 
T(1 ,4) = T(l,3) ; 
for j=2:9 

7. d (recursive) 

T(j,4) = alpha * T(j-1,4) - constant; 
end7«f or 


disp(’ alpha lambda d (expl) d (rec)’); 

disp( ’ ’); 


disp(T) 

produces 


alpha lambda d (expl) d (rec) 


1.61816 

1.61814 

1.61764 

1.61822 

1.61797 

1.61804 

1.61805 
1.61804 
1.61804 


0.80015 

0.80010 

0.79855 

0.80158 

0.79941 

0.80045 

0.80059 

0.80031 

0.80031 


2.12851 

3.27263 

5.12339 

8.11832 

12.96403 

20.80458 

33.49095 

54.01799 

87.23153 


2.12851 
3.27252 
5.12356 
8.11863 
12.96477 
20.80601 
33.49346 
54 . 02225 
87.23866 
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It is worth taking some time to make sure you understand all the lines of this program. It uses assignment, built-in 
functions, inline functions, simple output, arrays, and for loops. The °L on a line tells Octave to ignore it and 
everything on the line that follows. These tidbits are called comments. They are strictly for the human user to 
document what the program does. Lengthy programs should always be documented so any user of the program will 
be better able to understand what it does. Here the comments are simple, but they may be much more elaborate. 


Exercises 

1. Some convergent sequences and their limits are given. 
Find the order of convergence for each. 


(a) 

(b) 

(c) 

(d) 

(e) 


n n 

1 


2 2 — 2 
2 2 " +3 

n 2 

1 + n 2 

e? 
e® r 


[A] 


n + 1 


-^> converges to 1 lin- 
1-2 is quadratically 


2. Show that the sequence 
early. 

3. Show that the sequence p n = 2 
convergent. 

4. Give an example of a sequence which converges to 0 
with order a = 10. 

5. Approximate the order of convergence of the sequence 
p n and explain your answer. 


n 

Pn + 1-P| 

\Vn-v\'-' Z 

|Pn+l-p| 

Ipn-Pl 1 ' 3 

|Pn + l-p| 
IPn-pl 1 ' 4 

25 

9.07(10) _e 

.0110 

13.39 

26 

1.88(10) -7 

.00303 

48.65 

27 

28 

l.Ol(lO)" 9 

.000530 

277.8 


6. Some linearly convergent sequences and their limits are 
given. Find the (fastest) rate of convergence of the form 
or O (T) for each. If this is not possible, sug- 
gest a reasonable rate of convergence. 


, x , 6 6_ _6_ 

5 *7 7 A O’ Q/tQ 


6 


7’ 49’ 343’ 2401’ 
Hn — 2 \ u 


0 


n + 3 
sinn 


(b) 

(c) 

(d) 

(e) 

(f) 

(g) 

\5 n + 3 1 

(h) n + 47 - y/n) - 

1 

3 


n 


10 n + 35n + 9 
4 

10’ 1 - 35n - 9 
2 n 

%/n 2 + 3n 
5 n - 2 


7 0 
7 0 

2 [A] 


1 


[A] 


(i) 


3n 2 + 1 


\ e n — 7r n , 


(k) 


• 0 


( l ) ( , T + ” ( f ") \->0 
+ 1 


8n 2 


(m) ( — 1 

v ; \ 3n 2 + 12 3n + 10 

, . / 2 n 2 + 3 n 

(n) 

(o) 


1 - n 2 ' ' 2 

3n 5 — 5 n\ 

1 — n 5 / 


7. Find the rates of convergence of the following sequences 


as n — 7 oo 

(a) lim sin — = 0 

n—too 71 

(b) lim sin = 0 

n — 7 nn ^1^ 

1 


sin — J =0 


(c) lim I sir 

71— >-00 \ 

(d) lim [ln(n + 1) — ln(n)] = 0 

n—too 

For questions on this page- on the next page, use the 
following definition for rate of convergence for a func- 
tion. For a function /(/i), we say linih-»a f(h) = L with 
rate of convergence g(h ) if \f(h) — L\ < X\g(h)\ for some 
A > 0 and all sufficiently small \h — a\. 

8. Use a Taylor polynomial to find the rate of convergence 
of 

lim (2 — e h ) = 1. 

7i-> o 

9. Use a Taylor polynomial to find the rate of convergence 
of 


sin(/i) — e h + 1 _ 


lim 

h — >-0 h 

10. Find rates of convergence for the following functions as 
h-> 0. 

, . sin h 
(a) lim — - — = 1 
h — h 


(b) lim 1 ~ c° s h = 0 
h^O h 

f N , . sin h — h cos h 

(c) lim = 0 

h^O h 


1 - e h 


= -1 


(d) lim — 
h^O h 

11. Find the rate of convergence of 


.. h 2 + cosh — e h . 

lim = —1. 

h^o h 
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12. Show that 

(sin h)(l — cos h) = 0 + 0{h 3 ). 

13. * ° Write an Octave program (.m file) that uses a loop 
and the dispO command to produce the following out- 
put (powers of 7). ^ 

1 

7 

49 

343 

2401 

16807 

117649 

823543 

5764801 

40353607 

14. * ° Write an Octave program ( . m file) that uses a loop 
and the dispO command to output the first 10 powers 
of 5 starting with 5°. 

15. * • Write an Octave program (,m file) that uses a loop, 
an array, and the disp() command to find the values of 

f(n) = + g for n = 0, 1, 2, 4, 6, 10. [s] 

16. * • Write an Octave program (,m file) that uses a loop, 
an array, and the disp() command to find the values of 

2n 

f(n) = for n = 0, 2, 5, 10, 100, 1000, 20000. 

V n 2 + 3n 

17. * ■ The following Octave code is intended to calculate 
the sum 

30 

1 

k 2 

k = 1 

but it does not. Find as many mistakes in the code as 
you can. Classify each mistake as either a compilation 
error (an error that will prevent the program from run- 
ning at all) or a bug (an error that will not prevent the 
program from running, but will cause improper calcu- 
lation of the sum). 


sum=l ; 
for k=l : 30 

sum=sum+l . 0/k*k ; 
end 

disp(sum) 

18. Some sequences do not have an order of convergence. 
Let p n = 

(a) Show that limn-xx, p n = 0. 

(b) Show that lim n ->oo = 0. 

(c) Show that diverges for any a > 1. 

19. Use the rules of thumb for order of convergence to 
approximate the number of iterations it will take to 
achieve 12 significant digits of accuracy of 7r for each 
order of convergence. Assume each sequence starts with 
one significant digit of accuracy. 

(a) a = 1, A = 0.8 

(b) a = 1, A = 0.5 [S1 

(c) a = 1, A = 0.1 

(d) a = 1.5 

(e) a = 2 [A > 

(f) a = 3 

20. Prove that the order of convergence of a sequence is 
unique. 

21. * • Write a for loop that outputs the sequence of num- 
bers. 

(a) 7,8,9,10,11,12,13,14,15 

(b) 20,19,18,17,16,15,14,13 

(c) 12, 12.333, 12.667, 13, 13.333, 13.667, 14 

(d) 1,9, 25, 49, 81, 121, 169, 225, 289, 361, 441 

(e) 1, .5, .25, .125, .0625, .03125, .015625 
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1.4 Recursive Procedures 

The Mathemagician 

Mathemagician: I have here an ordinary bed sheet. Nothing up my sleeves. No secret pockets. Maybe just a 
touch of magic dust. But other than that, an ordinary bed sheet. When lain flat it is, of course one 
layer thick. As I take these corners in my hands and place them over the opposite corners, folding the 
bed sheet in half, how many layers thick does it become? 

Audience: Two! 

Mathemagician: Very good. Allow me to fold it in half again. Now how many layers thick has it become? 
Audience: Four! 

Mathemagician: Excellent. Watch very closely as I fold it for a third time. Think hard and tell me how many 
layers thick is the folded sheet now. 

Audience: Six! (from a few) Eight! (from more) 

Mathemagician: That’s right. Eight. So much for the warm up. I shall now have my lovely assistant bring 
out another perfectly ordinary bed sheet. This time already folded. Crystal! The bed sheet please ... 
(Crystal brings out the bed sheet, already folded). Again, an ordinary bed sheet. This time folded. 
I shall now fold it in half as I have done before and ask again, how many layers thick has the sheet 
become? 

Audience: (Mostly silent-just some murmurings) 

Mathemagician: I see. Well, I don’t know either... 

Audience: (Laughing) 

Mathemagician: ...but I can tell you it is twice as many layers thick as it was before! 

Audience: (Mostly silent-just a few groans) 

Mathemagician: I know. I know. A cheap parlor trick. But wait! Watch as I slowly unfold the sheet, one fold at 
a time. One! ... Two! (he peers toward the sky as if in thought) ... Three! ... (again seemingly deep in 
thought) ... Four! ... Four times folded in half and now, as you can plainly see, the sheet is three layers 
thick. The first fold was in thirds, (he peers off into space, waves his wand, stares deep into the eyes of 
the audience) Forty-eight!!! 

Audience: (Silent but clearly wanting of an explanation) 

Mathemagician: The sheet started 3 layers thick, and was doubled in thickness four times ... 3 ... 6 ... 12 ... 24 

... 48. 

Though it was meant to seem like a wise crack, the observation that folding a sheet in half doubles the number of 
layers was the key to counting the layers in the folded sheet. Recursive procedures are magical in the same way. 
They seem to hold nothing of value when, in fact, they hold the key. They are based on the principle that no matter 
what the current state of affairs (no matter how many layers thick the sheet is) , following the procedure (folding it 
in half) will produce a predictable result (double the thickness). 

Perhaps the simplest numerical example of this idea comes from thinking of a bag of marbles — an opaque bag 
with an unknown number of marbles inside. One marble is added, and you are asked how many are inside. Of 
course the best you can say is something like “one more than there were before.” Even though you do not know 
how many marbles are in the bag to begin with, when one is added to the bag, you know the new total is one more 
than the previous total. This is recursive thinking. 
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Figure 1.4.1: 2x3 and 6x9 grids can be tiled with trominos. 


Figure 1.4.2: A 2 n x 2" grid can be (almost) tiled recursively. 


Trominos 

Connect three squares edge-to-edge in the shape of an L, and you have a tromino. Trominos aren’t used in games 
like dominoes are, but are often used in interesting mathematical questions involving tiling. Tiling with trominos 
means covering without overlapping trominos and without having any parts of trominos lying outside the shape 
being tiled. For example, a 2 x 3 grid can be tiled with trominos as can a 6 x 9 grid. See Figure 1.4.1. If n is a 
positive integer, then a 2" x 2 n grid can almost be tiled with trominos. All but one square can be covered. Try it, 
first with a 2 x 2 grid. That one’s not too hard. Then try it with a 4 x 4 grid or an 8 x 8 grid. 

How about a 1024 x 1024 grid? I can’t recommend that you actually get yourself a 1024 x 1024 grid of squares 
and start filling in with trominos. It would take 349,525 trominos. You may not finish in your lifetime! Instead, 
it is time to start thinking recursively. Use the previous result in your answer. The same way you can just say 
the marble bag “has one more than before”, we can phrase the solution to tiling the 1024 x 1024 grid in terms of 
the tiling of the 512 x 512 grid. Here’s how it goes. Take a 1024 x 1024 grid and section it off into four 512 x 512 
subgrids by dividing it down the middle both horizontally and vertically. In the upper left 512 x 512 grid, tile all 
but the bottom right corner. In the lower left 512 x 512 grid, tile all but the upper right corner. In the lower right 
grid, tile all but the upper left corner. Finally, in the upper right 512 x 512 grid, tile all but the upper right corner 
(Figure 1.4.2). This leaves room for a single L-shaped tromino in the middle, and one square left over. That’s it! 
It should feel a little bit like cheating since we didn’t specify how to deal with the 512 x 512 grid, but the same 
argument applies to the 512 x 512 grid. You can section it off into four subgrids, tile those and be done. 

The same tiling argument can be made for any 2 n x 2 n grid based on the 2 n ~ 1 x 2 n ~ 1 tiling, except when n = 1. 
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Figure 1.4.3: The 32 x 32 grid recursively tiled. 



You just have to tile the 2x2 grid yourself! But once that’s done, you have a complete solution for any 2" x 2" 
grid. A similar exception applies to every recursive procedure. The recursion is only good most of the time. At 
some point, you have to get your hands dirty and supply a solution or answer. Such an answer is often called an 
initial condition. 


Crumpet 9: Proof by induction 


Proof by induction also uses a sort of recursive thinking. In the method, one must prove that a claim is true for 
some value of the variable. This part is analogous to having an initial condition. Then one must prove that the 
truth of the claim for the value n implies the truth of the claim for n + 1. This is analogous to the recursive 
relationship between states. In fact, the construction of a tiling for the 2" x 2" grid based on the 2 ri_1 x 2 n ~ 1 
grid plus the tiling of the 2x2 grid just presented essentially form a proof by induction that the 2™ x 2 n grid, 
save one corner, can be tiled by trominos for any n > 1. In this way, all proofs by induction boil down to the 
ability to see the recursive relationship between states. 

In 1954, Solomon Golomb pubished a proof by induction that the 2™ x 2 n grid minus any single square (not 
necessarily a corner), called a deficient square, can be tiled by trominos. Can you construct a (recursive) tiling 
of a 2 n x 2” deficient square? You may use the tiling of a 2 k x 2 k grid minus one corner in your construction. 

Reference [12] 


Octave 

Custom functions 

As any modern useful programming language does, Octave allows custom functions beyond those that can be written 
as a single inline formula. Let’s say you are interested in the maximum value a function takes over an evenly 
spaced set of values. That function has a very special purpose and is not commonly used. Consequently, it is not 
built into any programming language, so if you really want a function that does that, it is your job to write it. 
Similarly, if you want a function that calculates the symmedian point of a triangle, you need to write it. In fact, 
most anything computational beyond evaluating basic functions will not be built into Octave. 

Custom functions are written around three basic pieces of information: a name for the function, a list of inputs, 
and a description of the output. These three things should be well defined before the work of writing the function 
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begins. Actually writing the function involves simply telling Octave the desired name, inputs, and how to determine 
the output. The basic format for a function is this: 

function ans = myName ( input 1 , input2, ... ) 


ans = final answer; 
end“/ 0 f unction 

The first line holds the name of the function and a list of inputs. The rest of the function is dedicated to computing 
the output, ans. 

The function that determines the maximum value of a function over an evenly spaced set of values might be 
written following these steps. First, we decide to name it “maxOverMesh”. Notice there are no spaces and no special 
characters in the name. There’s a very limited supply of non-alphabetic characters that can go into the name of a 
function. It’s usually safe to assume an underscore and numbers are acceptable, but you can’t count on anything 
else! It’s best to keep it at that. Second, we need to think about what inputs are necessary for this function. Of 
course, the function to maximize is required, and somehow the mesh of points where it should be checked needs to 
be specified. There are multiple ways to do this, but perhaps the one that is easiest for the user is to require the 
lower end point, upper end point, and number of intervals in the mesh. Finally, we need to write some code that 
will take those inputs and determine the maximum value of the function over the mesh. One way to do it is this: 


mnmnmmmmmmmmmmmmmmmm i 

°i maxOverMesh () written by Leon Q. Brin 21 January 2013 7« 
% INPUT: Interval [a,b] ; function f; and number of 7« 
°i subintervals n. 7, 
7. OUTPUT: maximum value of the function over the end 7« 
7o points of the subintervals. 7« 


function ans = maxOverMesh (f , a, b,n) 
ans = f (a) ; 
for i=l:n 

x = (i*b + (n-i)*a)/n; 

F = f (x) ; 

if (F>ans ) ans = F; 
end7«f or 
end7«f unction 


It is good practice to preface each function you write with a comment containing a three-point description of the 
function — the name, inputs, and output. If you or anyone else looks at it later, you will have a quick summary of 
how to use the function and for what. 

Whatever the last value assigned to ans when the function is complete will be the output of the function. The 
function starts by assigning the value of the function at the left end point to ans. Then it loops through the rest of 
the subinterval end points, calculating the value of the function at each one. Each time it finds a value higher than 
ans, it (re-)assigns ans to that value. At the end of the loop, the greatest value of the function has been assigned 
to ans. 

To use a custom function, save it in a .m file with the same name as that of the function. For example, the 
maxOverMeshO function would be saved in a file named maxOverMesh. m. Then your custom function can be called 
just as any built-in Octave function as long as the . m file is saved in the same directory in which the program using 
it is saved. Or, if using it from the command line, the working directory of Octave (the one from which Octave was 
started, unless explicitly changed during your session) must be the directory in which the .m file is saved: 

octave :1> maxOverMesh(inline ( ’ (x~2-6*x+8) *exp(x) ’ ) , 0, 4, 99) 
sins = 8.6728 

octave :2> f = inline (’ (x~2+3*x-5) / (x~2-3*x+5) 1 ) 
f = f(x) = (x~2+3*x-5) / (x~2-3*x+5) 
octave :3> maxOverMesh(f , -5, 5, 225) 
ans = 2.6362 


maxOverMesh. m may be downloaded at the companion website. 
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Recursive functions 

Thinking recursively, what would you say if I asked you what 10! was? Think about it for a moment before reading 
on. That’s right! 10 factorial is just 10 times 9!: 

10! = 10-9-8-7-6-5-4-3-2-1 

= 10 • (9 • 8 • 7 • 6 • 5 • 4 • 3 • 2 • 1) 

= 10 -(9!). 


No need to come up with a number. Just a recursive idea, because of course the idea works just as well for 9!, and 
so on . . . up to (or should I say down to?) a point. At what point is it no longer true that n! = n ■ (n — 1)!? When 
n = 0. We need to specify that 0! = 1 and not rely on recursive thinking in this case. But only this case! 

Let’s see how this recursive calculation works for 5!. According to the recursion, 5! = 5 • 4!. But 4! = 4 • 3! 
so we have 5! = 5 • (4 • 3!). But 3! = 3 • 2! so we now have 5! = 5(4(3 • 2!)). Continuing, 2! = 2 • 1! = 2 • 1 • 0! 
so we now have 5! = 5(4(3(2(1 • 0!)))). And now the recursion stops and we simply plug in 1 for 0! to find out 
that 5! = 5(4(3(2(1(1))))). Maybe you were expecting 5 • 4 • 3 • 2 • 1 for a final result instead. Of course you get 
120 either way, so from the standpoint of getting things right, either way is fine. Pragmatically, the point is moot. 
Computing factorials recursively is dreadfully inefficient and impossible beyond the maximum depth of recursion 
for the programming language in use, so should never be used in practice anyway. Its only value is as an exercise 
in recursive thinking and programming. 

Generally, a recursive function will look like this: 

function ans = recFunction(inputl , input2, ... ) 
if (recursion does not apply) 
return appropriate ans 
else 

return recFunction(il , i2, ... ) 
end“/ 0 if 

end%f unction 

Determining whether the recursion applies is the first item of business. If not, an appropriate output must be 
supplied. Otherwise, the recursive function simply calls itself with modified inputs. Since the recursive (wise-guy) 
definition of n! is n ■ (n — 1)! and applies whenever n > 0, and 0! = 1, the recursive factorial function might look 
like this: 


7, recFactorial () written by Leon Q. Brin 21 January 2013 °i 
°L is a recursively defined factorial function. 7« 

7, INPUT: nonnegative integer n. 7, 

7. OUTPUT: n! 7. 

“/ 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 

/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o 

function ans = recFactorial (n) 
if (n==0) 


ans = 1 ; 
else 

ans = n*recFactorial (n-1) ; 

end7«if 

end7«f unction 


Note the == when checking if n equals 1. This is not a typographical error. This is very important. All programming 
languages must distinguish between assignments and conditions. On paper, it may seem natural to write x = 3 
when you want to set x equal to 3. It may also seem natural to write “if x = 3, everything is good.” We use 
the “equation” x = 3 exactly the same way on paper to mean two very different things. When we set x = 3 we 
are making a statement, or assignment of the value 3 to the variable x. But when we write “if x = 3 ...” we are 
making a hypothetical statement, or a conditional statement. The value of x is unknown. In Octave the distinction 
is made by using a single equals sign, =, to mean assignment and two equals signs, ==, to mean conditional equals. 
recFactorial. m may be downloaded at the companion website. 
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Exercises 

1 . * ■ Write a .m file with a function that takes one input, 
squares it, and returns the result. Your file should 

(a) contain a comment block at the beginning con- 
taining your name, the date, and an explanation 
of what the program does and how to use it. 

(b) have a function of the form foo(x) in it that re- 
turns the square of its input (argument) x. 

Make sure to test your function from the Octave com- 
mand prompt. 

2 . 6 o The Octave function foo(x) is shown below. 

function res = foo(x) 
if (x< 1 ) 
res = 0; 
else 

half = x/2 ; 

floorhalf = floor(half) ; 
if (half == floorhalf) 

res = 0 + f oo(floorhalf ) ; 
else 

res = 1 + f oo(floorhalf ) ; 
endif 
endif 

endf unction 

(a) Findfoo( 2 ). 

(b) Find foo( 23 ). 

3 . * ■ Write a recursive Octave function that will calculate 



i= 1 


4 . " ■ Write a recursive Octave function that calculates 
<Zn for any n > 0 given 

a 0 = 100, 000 

a n = 1 . 05 a n -i — 1200, n > 0. 

5 . The Fibonacci sequence, (F n ), is recursively defined by 

F n + i = F n + F n - 1, n > 1 
F 0 = 1 

Ft = 1 

so the first few terms are 1,1,2, 3 , 5 , 8. 

(a) Write a recursive function that calculates the n th 
Fibonacci number. Your function should have one 
argument, n. 

(b) Write a function that uses a for loop to calculate 
the n th Fibonacci number. Your function should 
have one argument, n. 

(c) Write a program that calls the function from 5 a 
to calculate Fbo- 

(d) Write a program that calls the function from 5 b 
to calculate F30. 


(e) Which code is simpler (recursive or nonrecursive)? 

(f) Which code is faster? 

(g) Which code is more accurate? 

NOTE: F30 = 1346269 . 

6. Let the sequence ( a n ) be defined by 

1 1 2 1 

a n+ i = - | 5 a n — 30 a n + 25 1 , n > 1 

17 + 2^7 

a ° — c • 

5 

(a) Calculate an, <12 and 03 exactly. 

(b) Find <220 and 0,51 exactly. 

(c) Write a recursive function that calculates the n th 
term of the sequence. Your function should have 
one argument, n. Write a program that calls this 
function to calculate <21,02,03,020, and 051. 

(d) Write a function that uses a for loop to calcu- 
late the n th term of the sequence. Your function 
should have one argument, n. Write a program 
that calls this function to calculate ai, 02, 03, 020, 
and 051. 

(e) Which code is simpler (recursive or nonrecursive)? 

(f) Which function is faster? 

(g) Which code is more accurate, and why? 

(h) Which function is better, and why? 

(i) Do you trust either function to calculate O6oo ac- 
curately? If not, why not? 

7 . Trominos, part 1 . ^ 

(a) Recursively speaking, how many trominos are 
needed to tile a 2” x 2" grid, save one corner? 

(b) What is the greatest (integer) value of n for which 
the recursive definition does not apply? 

(c) For the value of n of part 7 b, how many trominos 
are needed? 

8. * • Trominos, part 2 . ^ 

(a) Write a recursive Octave function for calculating 
the number of trominos needed to tile a 2” x 2“ 
grid, save one corner. 

(b) Use your function to verify that 349 , 525 tromi- 
nos are needed to tile a 1024 x 1024 grid, save one 
corner. 

9 . The Tower of Hanoi, part 1 . The Tower of Hanoi is 
a game played with a number of different sized disks 
stacked on a pole in decreasing size, the largest on the 
bottom and the smallest on top. There are two other 
poles, initially with no disks on them. The goal is to 
move the entire stack of disks to one of the initially 
empty poles following two rules. You are allowed to 
move only one disk at a time from one pole to another. 
You may never place a disk upon a smaller one. 

(a) Starting with a stack of three disks, what is the 
minimum number of moves it takes to complete 
the game? Answer this question with a number. 
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10 . 


11 . 


12 . 


13. 


(b) Starting with a stack of four disks, what is the 
minimum number of moves it takes to complete 
the game? 

i. Answer this question recursively. 

ii. Answer this question with a number based 
on your recursive answer. 

The Tower of Hanoi, part 2. bl 

(a) Starting with a “stack” of one disk, what is the 
minimum number of moves it takes to complete 
the game? 

(b) ^ • Use your answer to (a) plus a generalization 
of your answer to question 9(b)i to write a recur- 
sive Octave function for calculating the minimum 
number of moves it takes to complete the game 
with a stack of n disks. 

(c) * ■ Use your Octave function to verify that it 
takes a minimum of 1023 moves to complete the 
game with a stack of 10 disks. 

The Tower of Hanoi, part 3. The Tower of Hanoi with 
adjacency requirement. Suppose the rules of The Tower 
of Hanoi are modified so that each disk may only be 
moved to an adjacent pole, and the goal is to move the 
entire stack from the left-most pole to the right-most 
pole. 

(a) What is the minimum number of moves it takes 
to complete the game with a “stack” of one disk? 

(b) Find a recursive formula for the minimum num- 
ber of moves it takes to complete the game with 
a stack of n disks, n > 1 . 

(c) * ■ Write a recursive Octave function for the mini- 
mum number of moves to complete the game with 
a stack of n disks. 

(d) * ■ Use your Octave function to compute the min- 
imum number of moves it takes to complete the 
game with a stack of 5 disks. 10 disks. 


(a) How many partitions of A into k nonempty sub- 
sets include the subset {n}? Give an answer in 
terms of Stirling numbers of the second kind. 

(b) How many partitions of A into k nonempty sub- 
sets do not include the subset {n}? Give an 
answer in terms of Stirling numbers of the sec- 
ond kind. Hint, consider partitions of B m 
{ 1 , 2 , 3, . . . , n — 1 } into k nonempty subsets. 


15. Stirling numbers of the second kind, part 4. 

(a) Use your answers to questions 13 and 14 to de- 
rive a recursive formula with initial conditions for 
the number of ways a set of n elements can be 
partitioned into k subsets. 

(b) * • Write a recursive Octave function that calcu- 
lates Stirling numbers of the second kind. 

(c) * » Use your Octave function to verify that 
5(10,4) = 34105. 


16. A set of blocks contains some that are 1 inch high and 
some that are 2 inches high. How many ways are there 
to make a stack of blocks 15 inches high? bl 

17. A male bee (drone) has only one parent since drones 
are the unfertilized offspring of a queen bee. A female 
bee (queen) has two parents. Therefore, 0 generations 
back, a male bee has one ancestor (the bee himself). 1 
generation back, the bee also has 1 ancestor (the bee’s 
mother). 2 generations back, the bee has 2 ancestors 
(the mother’s two parents). How many direct ancestors 
does a male bee have n generations back? 

18. Argue that any polygon can be triangulated (covered 
with non-overlapping triangles). An example of a tri- 
angulation of a dodecagon follows. 



Stirling numbers of the second kind, part 1. Let 5(n, k ) 
be the number of ways to partition a set of n elements 
into k nonempty subsets. A partition of a set A is a 
collection of subsets of A such that each element of 
the set A must be an element of exactly one of the 
subsets. The order of the subsets is irrelevant as the 
partition is a collection (a set of sets). For example, the 
partition {{1}, {2, 3}, {4}} is a partition of {1,2, 3, 4}. 
{{4}, {1}, {2, 3}} is the same partition of {1, 2, 3, 4}. 


(a) 

Find 

5(10,1). [SI 

(b) 

Find 

5(3,2). 

(c) 

Find 

5(4,3). 

(d) 

Find 

5(4,2). 

(e) 

Find 

5(8,8). 


Stirling numbers of the second kind, part 2. 


19. In questions 5 and 6 , you should have noticed that 
the recursive functions were slower than their for loop 
counterparts. How many times slower? Why is the Fi- 
bonacci recursion so many more times slower than its 
for loop counterpart? 

20. Let the sequences ( b n ) and (c n ) be defined as follows. 

b Q = i; b n +i = 4 b n - 1, n > 0 

Co = C„+1 = 4c n (l - Cn), n>0 

(a) Write a function that uses a for loop to calculate 
the n th term of (&„). Your function should have 
one argument, n. 

(b) Write a function that uses a for loop to calculate 
the n th term of (c n ). Your function should have 
one argument, n. 


(a) Find 5(n, 1). 

(b) Find S(n,n). 

14. Stirling numbers of the second kind, part 3. Let 
A={l,2,3,...,n}. M 


(c) Write a program that calls these functions to cal- 
culate 630 and C30. How accurate are these calcu- 
lations? HINT 630 = | and C30 = .32034 accurate 
to 5 decimal places. 

(d) Can you think of a way to make these calculations 
more dependable (more accurate)? 
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Root Finding 


2.1 Bisection 


In Section 1.2 (page 12), we claimed that u T 2 (x) actually approximates ln(x) to within 0.1 over the interval 
[3.296,13.13]”, with a promise that we would discuss the calculation later. It is now later. First, we rephrase 
the claim as “the distance between T 2 (x) and ln(x) is less than or equal to 0.1 for all x £ [3.296, 13.13].” In other 
words, 

\T 2 (x) — ln(x)| < — for all x £ [3.296, 13.13]. 

One way to begin solving this inequality is to consider the pair of equations T 2 (x) — ln(x) = ± . With a focus on 
solving 

T 2 (x) -ln(x) = (2.1.1) 

recall that T 2 {x) = 2 + . We are thus looking to solve the equation 


2 + 



( x — e 2 ) 2 
2e A 


ln(x) 


1 

10 ' 


Finally, having written the equation in full detail, it should come as no surprise that we will not be solving for 
x exactly. There is no analytic method for solving such an equation. Generally, equations with both polynomial 
terms and transcendental terms will not be solvable. However, from the graph in Figure 1.2.2, we can get a first 
approximation of the solution. We are looking for the place where T 2 (x) exceeds ln(x) by 0.1. Since the two 
graphs essentially overlap at x = 6, we might aver that 72(6) exceeds ln(x) by less than 0.1 there. Since there is a 
reasonably large gap between the graphs at x = 2, we might also aver that 7 2 ( 2) exceeds ln(x) by more than 0.1 
there. In other words, 7 2 ( 2) — ln(2) > A while 7 2 ( 6) — ln(6) < A. Since T 2 (x) — ln(x) is continuous on the interval 
[2,6], the Intermediate Value theorem guarantees there is a value c £ (2,6) such that T 2 (c) — ln(c) = V . It is this 
value of c we are after. And we know it is between 2 and 6. It’s a start, but we can do better! 

What about 4? Well, 72(4) — ln(4) « .04986 < 0.1, so now we know 7 2 (4) exceeds In (4) by less than 0.1. Now 
the Intermediate Value theorem tells us that c is between 2 and 4 ( T 2 (2 ) exceeds ln(;r) by more than 0.1). Shall we 
check on x = 3? Yes. 7 2 (3) — ln(3) « .131 > 0.1, so now we know 7 2 (3) exceeds ln(3) by more than 0.1. Recapping, 
72(4) — ln(4) < 0.1 while 72(3) ln(3) > 0.1. By the Intermediate Value theorem again, we know c is between 3 and 
4. And we may continue the process, limited only by our patience. This is the process we call the bisection method: 


1. Identify an interval [a, b] such that either a or b overshoots the mark while the other undershoots it. 

2. Calculate the midpoint, to, of the identified interval. 

3. If a and to both overshoot or both undershoot the mark, the desired value lies in [to, b\. 

4. If b and m both overshoot or both undershoot the mark, the desired value lies in [a, m]. 

5. Return to step 2 using the newly identified interval. 
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Figure 2.1.1: + indicates T 2 ( x) — ln(a;) > ^ and — indicates T 2 (x) — ln(s) < A. 

+ + — — — 

1 1 1 1 1 1 

2 3 3.25 3.5 4 6 

m 2 m 4 m 3 m t 


Using a + sign for values of x for which T 2 (x) — In (a;) overshoots the desired value 1 I () and a — sign for values of x 
for which T 2 (x) — ln(x) undershoots the desired value ^ . we may diagram this procedure, including the next two 
iterations, as in Figure 2.1.1. We might also reproduce the calculations in a table: 

a m b T 2 (a) — ln(a) T 2 (m) — ln(m) T 2 (6) — ln(6) 

~2 4 6 M16 R4986 

2 3 4 .3116 0.131 

3 3.5 4 0.131 0.0824 

3 3.25 3.5 

No matter how the procedure is understood, the sequence of approximations 

4, 3, 3.5, 3.25, ... 

is produced. What is the next value? Answer on page 45. 

Not only do we have a sequence of numbers approaching the solution, we know for certain that 4 is accurate to 
within 2 units of the exact value. 3 is accurate to within 1 unit. 3.5 is accurate to within 0.5 units. And 3.25 is 
accurate to within 0.25 units. In general, each approximation is accurate to within half the length of the interval 
from which it was computed as midpoint. After all, the exact value is guaranteed to lie within the interval. The 
farthest the midpoint can possibly be from the exact value is half the length of the interval. 

Though the method works perfectly well as described, normally the equation to be solved is simplified so that 
one side is zero. In that way, the other side can be thought of as a function whose roots are desired. Plus, it 
simplifies the implementation of the method slightly. For example, we would consider solving the equation 

T 2 (x) - In (a:) - ^ = 0 

instead of 2.1.1. Then the procedure boils down to finding a root of f(x ) = T 2 (x) — ln(a;) — yb. This is why this 
method is called a root-finding method. It is used to find zeros, or roots, of functions. In this light, we might 
summarize the first 8 iterations of this procedure as follows: 


a 

m 

b 

/(«) 

/M 

m 

2 

4 

6 

> 0 

< 0 

< 0 

2 

3 

4 

> 0 

> 0 

< 0 

3 

3.5 

4 

> 0 

< 0 

< 0 

3 

3.25 

3.5 

> 0 

> 0 

< 0 

3.25 

3.375 

3.5 

> 0 

< 0 

< 0 

3.25 

3.3125 

3.375 

> 0 

< 0 

< 0 

3.25 

3.28125 

3.3125 

> 0 

> 0 

< 0 

3.28125 

3.296875 

3.3125 





Notice two things. The actual values of /(a), f(m), and f(b) are not needed. Only their sign is important because 
all we need to do is maintain one endpoint where the function is greater than 0 (overshoots) and one where the 
function is less than 0 (undershoots). Furthermore, the f(a) and f(b) columns are not strictly necessary either. If 
the procedure is carried out faithfully, they will never change sign. In fact, that’s what it means to carry out the 
procedure faithfully! In steps 3 and 4, you choose which subinterval to keep by maintaining opposite signs of the 
function on opposite endpoints. 

As the last line indicates, the desired value is approximately 3.296 as promised. The other value, 13.13, is 
determined by finding a root of the function g(x ) = T 2 ( x) — ln(;r) + yb. Give it a shot! Start with a = 10 and 
b = 14, for example. Solution on page 45. 

Though it works, the only real point of carrying out the procedure using a table is to make sure you understand 
exactly how it works. If we were actually to use the method in practice, we would write a short computer program 


.002582 

.04986 

.04986 
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instead. Computers are very good at repetitious calculations, something at which humans are not particularly 
adept. In this procedure, we need to calculate a midpoint, decide whether this midpoint should then become the 
left or right endpoint, make it so, and repeat. 

That leaves only one question — how many repetitions, or iterations, should we compute? And that depends on 
the user. Perhaps an answer to within ICC 2 of the exact value will suffice, and maybe only ICC 6 accuracy will do. 
The program we write should be flexible enough to calculate the answer to whatever accuracy is desired, within 
reason. With that in mind, here is some pseudo-code for the bisection method. 

The Bisection Method (pseudo-code) 

Though technically not necessary for coding, when we can, we will preface each method’s pseudo-code with math- 
ematical assumptions that guarantee success. The implication is that if the method is run in a situation where the 
assumptions are not met, then the method should not be expected to provide dependable results. It may or may 
not give useful information. The old adage “garbage in... garbage out” applies! 

Assumptions: / is continuous on [a, b]. /(a) and f(b) have opposite signs. 

Input: Interval [a, 6]; function /; desired accuracy tol ; maximum number of iterations N. 

Step 1: Set err = \b — a|; L = /(a); 

Step 2: For j = 1 ... N do Steps 3-5: 

Step 3: Set m = M = f(m ); err = err/2; 

Step 4: If M = 0 or err < tol then return m; 

Step 5: If LM < 0 then set b = rri: else set a = m and L = M : 

Step 6: Print “Method failed. Maximum iterations exceeded.” 

Output: Approximation m within tol of exact root, or message of failure. 

As noted earlier, this method should calculate a midpoint (Step 3), decide whether this midpoint should then 
become the left or right endpoint (Step 5), make it so (Step 5), and repeat some number of times (Steps 1, 2, and 4). 
Much of the code is dedicated to determining when to stop. This is typical of numerical methods. The calculations 
are half the battle. Controlling the calculations is the other half. If we didn’t have to worry about stopping, the 
pseudo-code might look something like this: 

Step 1: Set L = /(a); 

Step 2: Set m = M = /(m); 

Step 3: If LM < 0 then set b = m; else set a = m and L = M; 

Step 4: Go to Step 2. 

There would be no need for j, err, tol, or N, making the algorithm quite a bit simpler. Of course, programmed 
this way, the program would never stop, so j, err, tol, and N, are indeed necessary. Nonetheless, this pseudo-code 
without the ability to stop is important. It can be thought of as the guts of the program. This is the code that 
executes the method. Sometimes it is easiest to start with the guts and then add the controls afterward. 

As for determining whether the midpoint should become the left or right endpoint, Step 5 (Step 3 of the 
guts) uses a somewhat slick method. By slick, I mean short, efficient, and not immediately obvious. The sign of 
LM = f(a) ■ f(m) is checked. If it is negative ( LM < 0) then m should become the right endpoint (should replace 
b) because this means /(a) and f(m) have opposite signs. That’s the only way LM can be negative. On the other 
hand, if LAI > 0 then we know /(a) and f(m) have the same sign, so m should become the left endpoint (should 
replace a). In Step 3 the midpoint is calculated without any fanfare. 

The rest of the code is there to make sure the program doesn’t do more than necessary and doesn’t end up 
spinning its wheels indefinitely. It is important to be able to separate, at least in your mind, the guts of the program 
from the stopping logic. As for the stopping logic, in Step 4, we stop if err < tol as we should. But we also check 
the unlikely event that M = 0 in which case we happened to hit the root exactly so should quit. Though it could 
be argued overkill to set a maximum number of iterations, N, in this program, it’s a good habit to get into. Some 
numerical methods provide no guarantee the required tolerance will ever be reached. For these methods, a fallback 
exit criterion is needed. Also, if tol were accidentally set to a negative value, it would certainly never be reached. 
The algorithm would have no way to stop without N. 
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Analysis of the bisection method 

There are two good reasons to study the bisection method. First, its assumptions for guaranteed success are much 
simpler to verify than those of other methods. Even so, be somewhat cautious. Faithful execution of any numerical 
method is subject to proper programming, accurate computation, and proper input. Programmers and users are 
not infallible. Nor are computers. Remember the lessons of Section 1.1. At the same time you should be wary of 
the results, you should temper your skepticism with a good dose of confidence in the method. It is only in rare 
circumstances that the computer will be the source of any problems. 

Second, error analysis is straightforward. Let mi = the midpoint of [a, b\. Let succeeding midpoints be 
m2, m 3 , 7714, and so on. Then the Intermediate Value theorem guarantees \mj — p\ < for some root p of 
f(x). As we learned in section 1.3, this means the sequence (m n ) converges to p with linear order, and rate of 
convergence O (yr). This method should be considered slow to converge because it does so with linear order. But 
among those methods with linear order, it should be considered fast. The error decays exponentially — faster than 
any polynomial decay. 


Key Concepts 

The Intermediate Value Theorem: Suppose / is a continuous function on [a, b] and y is between /(a) and f(b). 
Then there is a number c between a and b such that /(c) = y . 1 

Iteration: (1) Repeating a computation or other process, using the output of one computation as the input of the 
next. 

Iteration: (2) Any of the intermediate results of an iteration. Also called an iterate. 

The bisection method: Produces a sequence of approximations (m,-) that converges to some root in [a,b\. 

Error bound for the bisection method: The error of approximation mj is no more than ^ -. That is, m, — 
p\ < for some root p of f(x). 

Convergence for the bisection method: The bisection method converges with linear order and has rate of 
convergence O (yr). 


Octave 

Roughly half the work in writing pseudo-code for the bisection method was dedicated to the logic of the method- 
the determination of when to stop. In programming, this type of logic is handled by if then [else] statements, 
and variations thereof. It is common practice in programming to use square brackets to denote something that is 
optional. So the template if then [else] should be read to mean that logic is handled by if then statements or 
if then else statements. The exact syntax looks like this: 

if (condition) 

execute code here 
[else 

execute code here] 
end“/ 0 if 

Again, the square brackets indicate optional code. 

The if then statement works as you might imagine. In the if then form of the statement, all code between 
then and end is executed whenever the condition is true. It is skipped whenever the condition is false. The if 
then else form of the statement is similar. All code between then and else is executed whenever the condition 
is true. The code between else and end is skipped in this case. Exactly the reverse happens when the condition is 
false. The code between then and else is skipped while the code between else and end is executed. The simplest 
use of an if then else statement might look like this. 

if (n>10) 

disp(’n is big 1 ) 
else 

The word “between” in this theorem can be interpreted as inclusive or exclusive of the endpoint values as long as the same 
interpretation is made for each instance of the word. 
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disp(’n is small’) 
end%if 

In Octave, if then [else] statements are written almost exactly as they are in pseudo-code. In fact, much of 
the pseudo-code in this text will translate nearly verbatim into Octave. One notable exception is the symbol used 
in the condition. Octave requires a boolean operator in the condition. That is, an operator that will evaluate to 
either true or false. The = operator assigns a value to a variable. It is not a boolean operator so should not be 
used in an if condition. Instead, == (two equals signs) should be used. This table summarizes the six most common 
boolean operators in Octave. 


Comparison 

Operator 

greater than, less than 
greater than or equal, less than or equal 
equal 
not equal 

V 

- II J 1 V 

11 11 A A 

II 


If you needed to check if x > 0, you would use if (x>=0) in Octave. If you needed to check if t equaled 1, you 
would use if (t==l) in Octave. And so on. Logical operators are often needed as well. 


Logical Operator 

Octave Code 

and 

&& 

or 

1 1 


For example, if you need to check whether x is between a and b , as in a < x < b, a logical operator is needed. In 
this case, we need logical “and” since a < x < b means a < x and x < b. The Octave code would be if (a<=x && 
x<b) or something logically equivalent. 

Technically, an if then statement is concluded with an end statement. However, to emphasize the type of 
statement being ended, we will make a habit of ending an if then statement with end%if and ending a for loop 
with end“/ 0 for. The °/ 0 if and "/.for are just comments since they start with ”/ 0 . Consequently, they are not strictly 
necessary, but they may aid in the readability of your code, especially when you have nested constructs. When you 
have an if statement inside a for loop or vice versa, using end to end both of them is not as informative as using 
end%if and end“/ 0 for. 

An Octave program to find a root of /( x) = 2 + ln(ir) — A between 2 and 6 to within 10 -4 

using the bisection method with a maximum of 100 iterations might look like this. 

f = inline ( ’2+(x-exp(2) ) /exp (2)- (x-exp(2) ) ~2/(2*exp(4) ) -log (x)- 1/10 ’ ) ; 
a=2 ; 
b=6 ; 

err=b-a; 

L=f (a) ; 
for i=l : 100 
m=(a+b)/2; 

M=f (m) ; 
err=err/2 ; 

if (M==0 I I err<=10~-4) 
disp(m) ; 
return; 
end%if 
if (L*M<0) 
b=m; 
else 
a=m; 

L=M; 
end%if 
end%f or 

dispC’Method failed. Maximum iterations exceeded.’); 

This code would produce the correct result, 3.2952. Compare this code to the pseudo-code. You will see the main 
difference is syntax. However, there is one major disadvantage to writing the code this way. In order to change the 
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function, the endpoints, the tolerance, or the maximum number of iterations, the code needs to be modified in just 
the right place. That is no real disadvantage if you never need to run the bisection method again. But, generally, 
we should imagine that we will be running the methods we write many times over with different inputs. Or that we 
will be handing our code over to someone else to run many times over with different inputs. Imagine me handing 
you this code and asking you to find a root of f(x) = cosx — x between 0 and 3 to within 10~ 6 . It is not good 
practice to hard code the inputs to a method. Instead, they should be given as inputs to a programmed function. In 
Octave, this is done in a .m file. That doesn’t mean that we will simply take the code as written and save it in a .m 
file. The .m file will assume that the inputs — interval [o, 6]; function /; desired accuracy tol; maximum number of 
iterations N — will be supplied from another source — the user. The code inside the .m file should execute properly 
regardless of the (yet unknown) inputs. The syntax for an Octave function is: 

function result=name (input 1 , input2 , . . .) 

execute these lines 
end/, function 


function is a keyword that tells Octave a function is to be defined, result is the name of the variable that holds 
the answer, or result, of the function, name is the name of the function. It must also be the name of the .m file. A 
completed bisect ion. m file might look like this: 




"/, Bisection method written by Leon Q. Brin 09 July 2012 "/, 
°/ Purpose: Implementation of the bisection method / 
°L INPUT: Interval [a,b] ; function f; tolerance tol; and "/, 
"/, maximum number of iterations maxits. "/, 
1 OUTPUT: root res to within tol of exact or message of 7, 
% failure . °L 


mmmmnmmmmmmmmmmmmmmm 


function res=bisection(a,b , f , tol , maxits) 
err=b-a; 


L=f (a) ; 

for i=l:maxits 
m=(a+b)/2; 

M=f (m) ; 
err=err/2 ; 

if (M==0 I I err<=tol) 


res=m; 

return; 

end7«if 
if (L*M<0) 
b=m; 
else 


a=m; 

L=M; 
end7 0 if 
end7«f or 

res=’Method failed. Maximum iterations exceeded.’; 
end7«f unction 


Writing this way has not only the advantage of being easily reusable. It is also simpler! No need to worry about what 
function the root of which is desired; or over what interval; and so on. And it more closely resembles the pseudo-code. 
Once written and properly functioning, it can be saved as a .m file and never be looked at again (except for study). 
It just works! If you hand it off to someone to use, they should be able to use it without modification, bisection .m 

may be downloaded at the companion website. Now finding a root of f{x) =2+ ^ ln(cc) — ^ between 

2 and 6 to within 1CP 4 using the bisection method with a maximum of 100 iterations might look like this. 

octave : 9> 

f = inline ( ’ 2+(x-exp(2) ) /exp (2)- (x-exp(2) ) “2/ (2*exp(4) ) -log (x)- 1/10’ ) ; 
octave: 10> bisection(2,6,f , 10~-4, 100) 

= 3.2952 


ans 
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After bisect ion. m is written, the bisectionQ function becomes part of the Octave language. It can be called just 
like any built-in function. As a second example, we can find a root of /( x) = cos x — x between 0 and 3 like so: 

octave : 12> bisection(0 , 3 , inline! ’cos(x)-x’) , 10~-5 ,100) 
ans = 0.73909 


Exercises 

1 . " . Write an Octave function implementing the bisec- 
tion method as shown on the facing page. Save it as a 
.m file for future use. 

2. Use the Intermediate Value Theorem to show that the 
function has a root in the indicated interval. 

(a) f(x) = 3 — x — sin a:; [2, 3] 

(b) g(x) = 3x 4 — 2x 3 — 3x + 2; [0, 1] 

(c) g(x) = 3x 4 - 2x 3 - 3x + 2; [0, 0.9] [s] 

(d) h(x) = 10 — cosh(x); [—3, —2] 

( e ) f{t) = \/4 + 5sint — 2.5; [— 6 , —5] 

(f) g(t) = [21.5,22.5] M 

(g) h(t) = ln(3 sin t) - f ; [1,2] 

(h) /(r) = e sinr — r; [- 20 , 20 ] 

(i) g(r) = sin(e r ) + r; [-3,3] 

(j) h(r) = 2 sinr — 3 cosr ; [ 1 , 3 ] 

3. Create a table showing three iterations of the bisection 
method with the function and starting interval indi- 
cated in question 2 . 

4. Use your bisection. m code to find a root of the func- 
tion in the interval of question 2 to within 10 -8 . ' L 

5. Use the bisection method to find m 3 for the given func- 
tion on the given interval. Do this without a computer 
program. Just use a pencil, paper, and a calculator. 
You may check your answers with a computer program 
if you wish. ^ 

(a) f(x) = sfx — cos x on [ 0 , 1 ] 

(b) f(x) = 3(x + l)(x — |)(a: — 1) on [—1.25, 2.5] 

6 . Use the Bisection Method to find m 4 for g(x) = 
xsinx + 1 on [9, 10]. 

7. Use the bisection method to find m 3 for the equation 
x cos x — In x = 0 on the interval [7, 8 ] . 

8 . Use the bisection method to find a root of g(x) = 
sin x — x 2 between 0 and 1 with absolute error no more 
than 1/4. 

9. Approximate the root of g(x) = 2 + x — e x between 1 
and 2 to within 0.05 of the exact value using the bisec- 
tion method. 

10. There are 21 roots of the function f(x) = cos(x) on the 
interval [0, 65]. To which root will the bisection method 
converge, starting with a = 0 and b = 65? b? 

11. Find a bound on the number of iterations needed to 
achieve an approximation with accuracy 10~ 3 to the 
solution of x 3 + x — 4 = 0 on the interval [1,4] using 
the bisection method. Do not actually compute the 
approximation. Just find the bound. ® 


12. Find a bound on the number of iterations needed to 
achieve an approximation with accuracy 10 -4 to the 
solution of a; 3 — x — 1 = 0 on the interval [1,2] using 
the bisection method. Do not actually compute the 
approximation. Just find the bound. 

13. The graph of f(x) over the interval [0.75,2] is shown 
below. Notice f(x) has three roots on this interval: 
approximately .795, 1.06, and 1.59. To which of the 
three roots does the bisection method converge if we 
let a = .75 and 6 = 2? How do you know? 



14. Suppose you are trying to find the root of f(x) = 
x — e~ x using the bisection method. Find an integer a 
such that the interval [a, a + 2] is an appropriate one in 
which to start the search. 

15. Find a lower bound on the number of iterations it would 
take to guarantee accuracy of 10~ JO in question 6. 

16. How many steps (iterations) of the bisection method 
are necessary to guarantee a solution with 1CU 10 accu- 
racy if a root is known to be within [4.5, 5.3]? ^ 

17. Suppose you are using the bisection method on an in- 
terval of length 3. How many iterations are necessary 
to guarantee accuracy of the approximation to within 
10 “ 6 ? 

18. Suppose a function g satisfies the assumptions of the 
bisection method on the given interval. Starting with 
that interval, how many iterations are needed to ap- 
proximate the root to within the given tolerance? 

(a) [-7,10]; 10- 6 

(b) [5,9]; 10~ 3 

(c) [9,15]; 10- 10 

(d) [-6,-1]; 10- 105 (assume the computer calculates 
with 300 significant digits so round-off error is not 
a problem) 

19. " ■ 1 is a root of f(x) = ln(x 4 —x 3 —7x 2 + 13a:— 5) that 
can not be found by the bisection method. 

(a) Use a graph of the function near 1 to explain why. 
You may use the Octave code below to produce 
an appropriate graph. 

(b) Run the bisection method on / over the interval 
[0.8, 1.2] anyway. What happens instead of find- 
ing the root? 
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x=0 . 8 : .01:1.2; 

f =inline ( "log(x . ~4-x . ~3-7*x . ~2+13*x-5) ") ; 
plot (x, f (x) ) 

20. ° ■ 4 is a root of g(x) = | sin(7ra;)| that can not be found 
by the bisection method. 

(a) Use a graph of the function near 4 to explain why. 
You may use the Octave code below to produce 
an appropriate graph. 

(b) Run the bisection method on / over the interval 
[3.5, 4.5] anyway. What happens instead of find- 
ing the root? 

x=3. 5: .05:4.5; 
f =inline( "abs(sin(pi*x) ) ") ; 
plot (x, f (x) ) 

21. Let f(x) = sin(a; 2 ). / is continuous on [4,5], but 

/( 4) < 0 and /( 5) < 0, so the assumptions of the bi- 
section method are not met. Nonetheless, using the 
bisection method as described in the pseudo-code on / 
over the interval [4, 5] does produce a root. Explain. ^ 

22. The functions in questions 2e, 2f, and 2g all fail to meet 
the assumptions of the bisection method on the interval 
[—4, —0.5]. For each one, explain how so. 

23. " o Write an Octave function called collatz that takes 
one integer input, n, and returns 3n+ 1 if n is odd and 
n/2 if n is even. Save it as a collatz ,m file. Use an if 
then else statement in your function. HINT: Use the 
Octave ceiling function. If ceil (n/2) equals n/2, then 
n must be even (no remainder when divided by 2). Use 
your collatz function to calculate ^ 

(a) collatz (17) 

(b) collatz (10) 

(c) collatz (109) 

(d) collatz (344) 

24. * • Write your own absolute value function called 
absval (abs is already defined by Octave, so it is best 
to use a different name) that takes a real number input 
and returns the absolute value of the input. Use an 
if then else statement in your function. Save it as 
absval. m and test it on the following computations. 

(a) | - 3| 

(b) |123.2| 

(c) k-f | 

(d) I10-7T 2 ! 

25. f(x) = sin(a: 2 ) has five roots on the interval [7,8]. 
/( 7) < 0, /( 8) > 0, and / is continuous on [7,8], so 
the assumptions of the bisection method are met. The 
method will converge to a root. 

(a) Use your bisection. m file (Exercise 1) to deter- 
mine which one. ^ 

(b) Find 4 different intervals for which the bisection 
method will converge to the other four roots in 
[7,8]. 


26. The function shown has roots at approximately 2.41, 
4.11, 5.62, 7.01, 8.32, 9.57, 10.78, and 11.94. To which 
root will the bisection method converge with the given 
starting interval? 



(a) [2,3] 

(b) [6,8] 

(c) [2,6] 

(d) [5,9] 

(e) [10, 12] Note: the assumptions of the bisection are 
not met on this interval. Nonetheless, the method 
as outlined in the pseudo-code will converge to a 
root! 

27. Find an interval of length 1 over which the bisection 
method may be applied in order to find a root of 
/( x) =x 4 - 7.6746x 3 - 40.7477022a; 2 + 200.9894434a; + 
319.0914281. 

28. The following algorithm is one possible incarnation of 
the bisection method. 

Assumptions: / is continuous on [a, b], f(a) and f(b) 
have opposite signs. 

Input: Interval [a, 6]; function / 

Step 1: For j = 1 ... 15 do Steps 2 and 3: 

Step 2: Set m = 

Step 3: If < 0 then set b = m; else set 

a = m; 

Step 4: Print m. 

Output: Approximation m. 

(a) Apply this algorithm to the function f(x) = 
(x)(x — 2)(x + 2) over the interval [—3, 3]. Which 
root will this algorithm approximate? 

(b) How accurate is the approximation guaranteed to 
be according to the formula 

ip--pi< *9“? 

(c) How accurate is the approximation in reality? 
Compare this to the bound in (b). 

(d) Modify the algorithm so it will approximate a dif- 
ferent root using the same starting interval. 

(e) Modify the algorithm so it does not use multipli- 
cation. 

29. Use the following pseudo-code to write a slightly differ- 
ent implementation of the bisection method. Refer to 
Table 1.1 if you are unsure how to program the quan- 
tity [(ln(6 — a) — ln(TOL))/ In 2] . The while loop is 
discussed on page 61. 
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Input function /, endpoints a and b; tolerance TOL. 
Return approximate solution p and f(p) and the 
number of iterations done No. 

Step 1 Set i mm 1; FA = /(a); No = f (In 1 6 — a| — 
ln(TOL))/ In 2] ; 

Step 2 While * < No do Steps 3-6. 

Step 3 Set p = (a + b)/2; FP = f(p); 

Step 4 If FP = 0 then 

Return(p, f(p), No); STOP. 

Step 5 Set i = i + 1; 

Step 6 If FA ■ FP > 0 then 
Set a — p; FA = FP-, 
else 

Set b = p; 

Step 7 Return (p, f(p), No); 

STOP. 

(a) Discuss the advantages/disadvantages of this al- 
gorithm compared to the one on page 42. 

(b) Where does the calculation No = ](ln(& — a) — 
ln(TOL))/ln2] come from? 


30. Use the code you wrote for question 29 to find solutions 
accurate to within 10 -5 for the following problems. 


(a) x — 2 X = 0 on [0, 1] 

(b) e x — x 2 + 3x — 2 = 0 on [0, 1] 

(c) 2*cos(2a;) — (x+1) 2 = 0 on [-3,-2] and on [—1,0] 

31. Find an approximation of \/3 correct to within 10~ 4 
using the bisection method. Write an essay on how 
you solved this problem. Include your bisection code, 
what function and what interval you used and why. 

32. A trough of length L has a cross section in the shape 
of a semicircle with radius r. When filled with water to 
within a distance h of the top, the volume V of water 
is 


V = L 


0.57rr 2 — r 2 arcsin ^ — h\f\ 


— h\ r 2 — h 2 


Suppose L 10 ft, r — 1 ft, and V = 12.4 ft 3 . Find 
the depth of the water in the trough to within 0.01 ft. 
Note: In Octave, use asin(x) for arcsin(a;) and pi for 

7T. 


Answers 

What is the next value?: X2(3.25) — ln(3.25) ~ .10429, which overshoots the mark. So 3.25 becomes the new 
left endpoint, and the next value is 3 • 254-3,5 = 3.375, the midpoint of 3.25 and 3.5. 

The right endpoint is 13.13: Starting with a = 10 and b = 14, note that g(a) ~ .088 > 0 and g(b) « —.044 < 0, 
so g of the left endpoint should always be positive and g of the right endpoint should always be negative: 


a 

m 

b 

g{m) 


10 

12 

14 

.044 

=> m becomes left endpoint 

12 

13 

14 

.006 

=> m becomes left endpoint 

13 

13.5 

14 

-.017 

=> m becomes right endpoint 

13 

13.25 

13.5 

-.005 

=> m becomes right endpoint 

13 

13.125 

13.25 

.0004 

=A m becomes left endpoint 

13.125 

13.1875 

13.25 

-.002 

=> m becomes right endpoint 

13.125 

13.15625 

13.1875 

-.0009 

=> m becomes right endpoint 

13.125 

13.140625 

13.15625 

-.0002 

=> m becomes right endpoint 

13.125 

13.1328125 

13.140625 
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2.2 Fixed Point Iteration 

Grab your calculator. Anything with a cosine button will do nicely. Presuming you have a simple scientific 
calculator, press the all-clear button, usually marked AC or just C. The screen should now display 0. Press the 
cosine button, which should be marked cos. The screen should display 1. Press the cosine button again. The 
screen should display 0.540302 . . .. Repeat. Repeat again. In fact, continue pressing the cosine button until you 
notice a pattern. 

If you have a fancier calculator with a previous-answer button, usually marked Ans, press 0 and then Enter or =. 
Then press the cosine button and then the previous-answer button. Then press Enter or = to do the computation. 
The first time around, the screen should display 1 (just as with a scientific calculator). To repeat, however, just 
press Enter or = again. This will repeat the last computation. In this case, the cosine of the previous answer. The 
screen should display 0.540302 Now repeat until you notice a pattern. 

After about 30 repetitions, or, as we will call them, iterations, your calculator should display a number like 
0.739083847 . . .. And no matter how many times you repeat, or iterate, the calculation, it won’t change much. In 
fact, once it reaches 0.7390851332 . . ., it won’t change at all (unless your calculator shows more decimal places — after 
about 90 iterations, a calculator showing 15 decimal places will display 0.739085133215161 and it won’t change from 
there). What that means is cos(0. 7390851332 . . .) = 0.7390851332 . . .. And we call 0.7390851332 ... a fixed point of 
the cosine function. The value is fixed (does not change) when the cosine function is applied. Put another way, at 
0.7390851332 . . ., the input and output of the cosine function are equal. See a simulation of this iteration online at 
the companion website. 

Perhaps a whole series of questions now comes to mind. Why does this work? What if we start with a number 
other than 0? Does this work with any function? Can we predict when it will or won’t work? Can we find roots 
this way? Is convergence fast? In this section and the next, we will give at least partial answers to all of these 
questions. We start with “Why does this work?”. 

Consider solving the system 

! y = cos(x) 
y = x 

One way to do so is by the method of substitution. If we substitute y = x into y = cosx we get x = cosx or 
cos x = x. The solutions of the system coincide exactly with the fixed points of the cosine function, for any solution 
of cos# = x is a value x that is fixed by the cosine. Since systems of two equations in two unknowns can be solved, 
at least approximately, by graphing, this suggests that we might take a look at the graph of the system in order to 
learn more about what is happening during iteration. 


Figure 2.2.1: Finding the fixed point of cos(a;). 



Figure 2.2.1(a) shows the graphs of y = cos(a;) and y = x over the interval [0, 1]. We can see the intersection at 
around (0.75, 0.75) so we should think that the fixed point is around 0.75 (which of course we know is true from our 

calculator experiment). Figure 2.2.1(b) illustrates the exercise of computing cos(0), cos(l), cos(0. 540302 . . .), 

Following the vertical line segment from (0,0) to (0,1) represents calculating cos(0). Following the horizontal 
continuation from (0,1) to (1,1) and subsequently the vertical line segment from (1,1) to (1,0.540302...) rep- 
resents calculating cos(l). Following the horizontal line from (1,0.540302...) to (0.540302 ..., 0.540302 .. .) and 
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subsequently the vertical line from (0.540302 . . . , 0.540302 . . .) to (0.540302 . . . , 0.857553 . . .) represents calculating 
cos(0. 540302 . . .), and so on. With each pair of line segments, one going horizontally from the graph of y = cos(x) 
to the graph of y = x followed by one going vertically from the line y = x to the graph of y = cos(x), another 
iteration is shown. Figure 2.2.1(b) is sometimes called a web diagram [2], and is commonly used to illustrate the 
concept of iteration. That the path of the web diagram tends toward (0.739085 . . . , 0.739085 . . .) is an unavoidable 
consequence of the geometry of the graph of cos(x). 

What if we start with a number other than 0? Using figure 2.2.1, you should be able to convince yourself that 
convergence to the point (0.7390851332 . . . ,0.7390851332 . . .) is assured for any initial value between 0 and 1. Try 
it. Start anywhere on the line y = x. Proceed vertically to the graph of y = cos(x). Then horizontally to the line 
y = x. And repeat. You should find that the path of the web diagram always tends toward the intersection of the 
graphs. Now consider starting with any real number, r. The cosine of any real number is a number in the interval 
[—1, 1] so cos (r) £ [—1, 1]. And the cosine of any number in the interval [—1, 1] is a number in the interval [0, 1] so 
cos(cos(r)) £ [0, 1]. That is, the second iteration is in the interval from 0 to 1. So after only two iterations, any 
initial value will become a value between 0 and 1. And our web diagram implies that further iteration will lead to 
the fixed point. So, regardless of the initial value, iteration leads to the fixed point. And the preceding argument 
forms the seed for a proof of this fact. 

Not all functions are so well behaved, however. For example, l 2 = 1. In other words, 1 is a fixed point of 
the function y = x 2 . However, iteration starting with any number other than 1 or —1 does not lead to this fixed 
point. If we start with any number greater than 1 and square it, it becomes greater. And if we square the result, it 
becomes greater still. And squaring again only increases the value, without bound. Hence, iteration starting with 
any value greater than 1 (or less than —1) does not lead to convergence to the fixed point 1. Nor does iteration 
starting with any number of magnitude less than 1. Figure 2.2.2 illustrates iteration of y = x 2 with initial value 0.9. 


Figure 2.2.2: Visualizing the iteration of /(x) = x 2 . 



Follow the web diagram from the point (0.9, 0.9) vertically to the graph of y = x 2 and then horizontally back to 
the line y = x, and so on, to check for yourself. This is a nice illustration of the fact that the square of any number 
between 0 and 1, exclusive, is smaller than the number itself. With starting values between —1 and 1 exclusive 
of ±1, iteration gives a sequence converging to 0, not 1. To summarize, excepting —1 and 1, no initial value will 
produce a sequence converging to 1 under iteration of the function y = x 2 . 

There is a fundamental difference between the fixed point 0.7390851332 ... of /(x) = cos(x) and the fixed point 
1 of g(x) = x 2 . Fixed point iteration converges to 0.7390851332 . . . under /(x) = cos(x) for any initial value. Fixed 
point iteration fails to converge to 1 under g(x) = x 2 for all initial values but ±1. 2 Examining the graphs of /(x) 
and g{x) each superimposed against the line y = x in the neighborhood of their respective fixed points can give a 
clue [Figure 2.2.3] as to the difference. True, /(x) has a negative slope at its fixed point while g(x) has a positive 
slope at its fixed point. You can see this from the graphs or you can “do the calculus”. The important difference, 
though, is more subtle. It’s not the sign of the slope at the fixed point that matters. It’s the magnitude of the 
slope at the fixed point that matters. For smooth functions, neighborhoods of points with slopes of magnitude 


2 For a third type of behavior, fixed point iteration converges to 0 under g(x) for initial values near 0, but not for others! 
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Figure 2.2.3: Left: f(x) = cos(x) and y = x. Right: g(x) = x 2 and y = x. 



greater than 1 tend to be expansive. That is, points move away from one another under application of the function. 
However, neighborhoods of points with slopes of magnitude less than 1 tend to be contractive. That is, points move 
toward one another under application of the function. 


Proposition 2. If h{x) is differentiable on (a,b) with \h'(x)\ < 1 for all x £ (a,b), then whenever x\,x 2 £ (a, b), 
I h(x 2 ) - h(x i)| < \x 2 - xi|. 


Proof. Let xi,X 2 £ (a,b) and, without loss of generality, let X 2 > X\ so that we may properly refer to the interval 


from aqto X 2 - Since h(x) is continuous on [x\,x 2 
c £ (xi,x 2 ) Q (a, b) such that h'(c ) = 


h(x2)~ h(xi) 
X2—X1 


and differentiable on the mean value theorem gives us 


But h'(c ) < 1 by assumption, so h'(c) = 


which we immediately conclude that \h(x 2 ) — h(x i)| < \x 2 — Xi\ 


h(x2) — h(xi) 
X2—X1 


< 1, from 

□ 


Moreover, a function whose derivative has magnitude less than 1 can only cross the line y = x one time. Once it 
has crossed, it can never “catch up” because that would require a slope greater than 1, the slope of the line y — x. 

Proposition 3. Suppose h(x) is continuous on [a, b], differentiable on ( a,b ) with |/i'(:r)| < 1 for all x £ ( a,b ), and 
h([a, 6]) C [a, b ]. Then h has a unique fixed point in [a, b ]. 

Proof. If h(a) = a or h(b) = b, we have proved existence, so suppose h(a) 7^ a and h(b) 7^ b. Since h([a, b}) C [a, b] it 
must be the case that h(a) > a and h(b) < b. It immediately follows that h(a) — a> 0 and h(b ) — b < 0. Since the 
auxiliary function f(x) = h{ x) — x is continuous on [a, 6], the Intermediate Value Theorem guarantees the existence 
of c £ (a, b) such that /(c) = 0. By substitution, h{c) — c = 0, implying h(c) = c, so c is a fixed point of h. The 
existence of a fixed point is established. Now suppose C\ £ [a, b] and C2 £ [a, b] are distinct fixed points of h. Then 

h(c\ ) - h(c 2 ) _ ci - c 2 _ 

Cl — c 2 Cl — c 2 

By the mean value theorem, there exists C3 between ci and c 2 such that h' (cf) = 1, contradicting the fact that 
| ft' (a:) | < 1 for all x £ (a, b). Hence, it is impossible that Ci and c 2 are distinct. □ 


Hence, we can reasonably expect that when the derivative at a fixed point has magnitude less than 1, iteration is 
a viable method for approximating (finding) the fixed point, but when the derivative at a fixed point has magnitude 
greater than 1, iteration is not a viable method of approximating the fixed point. We must be careful, though, 
not to take this rule of thumb as absolute. It only applies to so-called well-behaved functions. In this case, that 
the function has a continuous first derivative in the neighborhood of the fixed point is well-behaved enough. The 
following theorem establishes that fixed point iteration will converge in a neighborhood of a fixed point if the 
magnitude of the function’s derivative is less than 1 there. 

Theorem 4. (Fixed Point Convergence Theorem) Given a function f(x) with continuous first derivative and fixed 
point x, if | /'(:£) | < 1 then there exists a neighborhood of x in which fixed point iteration converges to the fixed point 
for any initial value in the neighborhood. 
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Proof. By continuity, there exists e > 0 such that |/'(x)| < 1 for all £ £ (x — e,x + e). Let 0 < S < £ and set 

M = max |/'(x)|. Now suppose xo is a particular but arbitrary value in (x — 6, x + S). As in proposition 2, 
x€l[x— < 5 ,: r -|-< 5 ] 


the Mean Value Theorem is applied. This time, we are guaranteed c £ (x — 5, x + <5) such that /'(c) = ■ 

But |/'(c)| < M so | /(x) — f(x o)| < M\x — Xo|. Furthermore x is a fixed point, so f(x) = x, from which it 
follows that |x — /(xo)| < M\x — xo|. Now we define Xk = f(xk-i) for all k > 1 and prove by induction that 
\x — Xk\ < M k \x — xo| for all k > 1. Since X\ = f(x o), we have already shown |x — x\\ < M\x — xo|, so the 
claim is true when k = 1. Now suppose |x — Xk\ < M k \x — Xo| for some particular but arbitrary value k > 1. 
Note that |x — Xk\ < M k \x — Xo| implies Xk £ (x — 6, x + S) so we apply the Mean Value Theorem as before and 
conclude that \x — /(xfc)| < M |x — Xk\- Substituting Xk + 1 for f(xk) and using the inductive hypothesis, we have 


\x — Xfe+i| < M ■ M k \x — xq| = M k+1 \x — 


®o| 


Hence, we have 0 < \x — x*| < M k \x — xo|. Of course lim 0 = 0 and 

k—¥ oo 


lim M k \x — xo| = 0, so by the squeeze theorem, lim \x — Xk \ = 0. 

k—too k—t oo 


□ 


As suggested earlier, we should not expect fixed point iteration to converge when the derivative at a fixed 
point has magnitude greater than one. In fact, more or less the opposite happens. There is a neighborhood of 
the fixed point in which fixed point iteration is guaranteed to escape the neighborhood for any initial value in the 
neighborhood not equal to the fixed point itself. Given that fact, it is tempting to think that perhaps the Fixed 
Point Convergence Theorem could be strengthened to a bi-directional implication, an if-and-only-if claim. And it 
“almost” can. What can be said here has direct parallels to the ratio test for series. Recall, for any sequence of real 

OO 

helps determine the convergence of ^ ak in the following way: 

fc= o 

oo 

• If L < 1, then ^ converges (absolutely). 

k=0 

oo 

• If L > 1, then ^ diverges. 

k = o 

oo 

• If L = 1, then a k may converge (absolutely or conditionally) or may diverge. 

fc =0 

Analogously, for any function /(x ) with continuous first derivative and fixed point x, the derivative /'(x) helps 
determine the convergence of the fixed point iteration method in the following way: 


numbers ao, ai, 02 , ■ ■ the limit L = lim 

k — >00 


dk + 1 


ak 


• If |/'(x)| < 1, then fixed point iteration converges to x for any initial value in some neighborhood of x. 

• If |/'(x)| > 1, then fixed point iteration escapes some neighborhood of x for any initial value in the neighbor- 
hood other than x. 

• If |/'(x)| = 1, then fixed point iteration may converge to x for any initial value in some neighborhood of x; 
or may escape some neighborhood for any initial value in the neighborhood other than x; or may have no 
neighborhood in which all initial values lead to convergence and no neighborhood in which all values other 
than x escape. 


The graphs in Figure 2.2.4 of functions with derivative equal to one at their fixed point help illustrate this last case. 


For one of these functions, fixed point iteration converges for all values in a neighborhood of the fixed point. For 
another of these functions, fixed point iteration escapes some neighborhood of the fixed point for all initial values in 
the neighborhood except the fixed point itself. And for the third of these functions, fixed point iteration converges 
to the fixed point for some initial values and escapes a neighborhood of the fixed point for others (and every 
neighborhood of the fixed point will have both types of initial values). Can you tell which is which? Figure it out 
by creating web diagrams for each. Answer on page 55. 

The proof of the Fixed Point Convergence Theorem can easily be extended to include initial values in any 
neighborhood of the fixed point in which the magnitude of the derivative remains less than 1. The size and 
symmetry of the interval are not important. For example, /(x) = |x 3 — x 2 + 2x + 1 has a fixed point at x = 2. The 
proof of the Fixed Point Convergence Theorem establishes convergence to 2 in a symmetric interval about 2 such 
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Figure 2.2.4: Convergence behavior when the derivative at the fixed point is 1. 



as [1.9, 2.1], But this interval is far from the largest neighborhood of initial values for which fixed point iteration 
converges to 2. We can find bounds on the largest such interval by solving the equation \f'(x)\ = 1. To that end: 


-x — 2x + 2 = ±1 


3x 2 — 16a; + 16 = 


3a; 2 - 16a; + 24 = 0 
8 ± i2y/2 


X = 


8 — 2\/T0 


or 


or 


±8 

3a; 2 — 16a; + 8 = 0 
8 ± 2-s/lO 


x = 


0.558 and 


8 + 2\/l0 


4.775, 


so we should expect fixed point iteration to converge to 2 on any closed interval contained in 

' 8 - 2/10 8 + 2 V / 10 N 


Now, if we have the computer execute fixed point iteration for a large number of evenly spaced initial values, say 
100, on the interval [—2,8] and record the results on a number line where we color an initial value black if it does 
not converge to 2 and green if it does converge to 2 (we will call such diagram a convergence diagram), we get 


8 


which shows that fixed point iteration converges to 2 on approximately [—0.5, 6.5]. Indeed, the experiment confirms 
that fixed point iteration converges on any closed interval contained in ^ 8 ~ 2 3 v ^° , s+Wio ^ as predicted. But the 
diagram shows convergence on an even larger set. We can conclude that the Fixed Point Convergence Theorem 
gives sufficient but not necessary conditions for convergence in a neighborhood of a fixed point. 

A graph of the function f(x) superimposed on the line y = x (Figure 2.2.5) gives some insight as to why the 
bounds s±2 j v/ ^ do not tell a complete story. By imagining the web diagram for any initial value between the two 
fixed points other than 2, that is —0.61 and 6.61, you should be able to convince yourself that fixed point iteration 
converges to 2 for any initial value in the interval (—0.61,6.61). Can you prove it? Graphs like those in Figures 
2.2.3, 2.2.4, and 2.2.5 are indispensable and should always be consulted when trying to understand fixed point 
iteration, but they should not be relied upon as proof. For that, we need to rely on theorems like the Fixed Point 
Convergence Theorem. 


Crumpet 10: One interesting quadratic 
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Trying to find roots of the logistic equation 

g(x) = (a — l)x — ax 2 

by applying fixed point iteration to the corresponding function /(*) = x + g(x) = ax(l — x) is a famous exercise 
in dynamical systems which has a nasty habit of not working! Complete the following investigation to see what 
happens. 

1. Show that f(x) = ax(l — x) as claimed. 

2. For each of the values a = 2.5, a = 3.2, a = 3.833, and a = 4, do the following. 

(a) Find the positive fixed point of / (root of g) analytically (using a pencil, paper, and some algebra). 

(b) Set *0 = 0.1 and use a computer program to calculate *975 through ieiooo- 

(c) Examine the 26 iterations of part (b) and describe what you see. 

3. Draw a connection between your results from part 2 and the following diagram. 



4. Use the diagram to predict a value of a for which you would expect fixed point iteration to lead to X975 
through X 1000 cycling through 4 different values. Check your prediction. 
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Figure 2.2.6: Convergence diagrams for 6 functions with the same fixed points. 


fi- 


h- 


h- 


u 


fb'- 


5 


f 6 : ~ 2 _1 0 1 2 3 4 5 

black: does not converge; green: converges to 3; red: converges to 1 + \/3; blue: converges to 1 — /3 


Root Finding 

When successful, fixed point iteration finds solutions of an equation of the form /(x) = x. A root finding problem 
requires the solution of an equation of the form g{ x) = 0. However, the equation /(x) = x has exactly the same 
solutions as the equation /( x) — x = 0, so finding fixed points of /(x) is equivalent to finding roots of g(x) = f(x) — x. 
Indeed, we can rephrase the example of finding fixed points of /( x) = |x 3 — x 2 + 2x + 1 as the problem of finding 
roots of g(x) = f(x) — x = | x 3 — x 2 + x + 1. But it is the opposite problem that is much more common. We have 
the question of finding the roots of a function and need to rephrase it in terms of a fixed point problem. 

Suppose we want the roots of g(x) = —x 3 + 5x 2 — 4x — 6. We can rephrase the question of solving g(x) = 0 as 
the problem of finding the fixed points of many different functions! But you will have to ignore some sage advice of 
your algebra teacher to derive them! The key is to use algebra to rewrite the equation —a: 3 + 5x 2 — 4x — 6 = 0 as 
an equation of the form x = /(x). The simplest way is to add x to both sides of the equation. This manipulation 
and several others are shown in the following list. 

• —x 3 + 5a; 2 — 4a; — 6 = 0 => —a; 3 + 5a; 2 — 3a; — 6 = x 

• —a; 3 + 5a; 2 — 4x — 6 = 0 =>■ -a; 3 + 5a; 2 - 6 = 4a; => -* 3 +|* 2 -6 = x 

. —a; 3 + 5a; 2 — 4x — 6 = 0 => —a; 3 — 4a; — 6 = -5x 2 => a3+ f+ 6 = x 2 => ± J^+f+^ = x 

• — x 3 + 5x 2 — 4x — 6 = 0 => 5x 2 — 4x — 6 = x 3 => v^5x 2 — 4x — 6 = x 


Can you see what has been done for each one? Thus, we have five candidates for fixed point iteration, /i(x) = 

K faix) = / 4 ( x) = _y^±p±6, and h{x) = y 5x 2 _ Ax _ 6} 


— — 6 


— x 3 + 5ar — 3x — 6, / 2 (#) = 
all of which will potentially give roots of g{x). There is a sixth function we will discuss in much more detail later: 
/ 6 (x) = The roots of g(x) are 1 — y/3 ~ —0.73, 1 + a/3 ~ 2.73, and 3, so we will consider convergence 

diagrams over the interval [—2, 5]. Fixed point iteration converges to different fixed points for the different functions 
despite the fact that all 6 functions have exactly the same three fixed points. The convergence diagrams of Figure 
2.2.6 are color-coded to reflect this fact. Black indicates lack of convergence just as before. Green, red, and blue 
indicate convergence to 3, 1 + a/3, and 1 — y/3, respectively. Notice that only / 6 provides convergence for, as far 
as we can tell, every initial value in [—2,5], and is also the only one for which fixed point iteration converges to 
different fixed points for different initial values. See if you can understand why each function has the convergence 
behavior it does by looking at the graphs of fi, fa , . . . , fe- Pay special attention to the graphs around 1 + y/3 and 


3 By calculating / 6 (1 - vdl), / 6 (1 + a/3), and / 6 ( 3), you can verify that /q has these three values as fixed points as well. 
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3. Looks can be deceiving in that area because the two fixed points are so close together. Also, see if you can find 
two initial values in [—2, 5] for which fixed point iteration on fa does not converge. What happens instead? For an 
extra challenge, see if you can find a third point in [—2, 5] for which fixed point iteration on does not converge. 
Hint: you may need to use a computer algebra system to find such a point exactly or use fixed point iteration to 
approximate it! Answers on page 55. 


The Fixed Point Iteration Method (pseudo-code) 

Though we spent a lot of time talking about how to determine whether we should expect the fixed point iteration 
method to converge or not, none of that information is strictly relevant to coding the method. Any implementation 
of the method should allow the user to try fixed point iteration for any function with any initial value. It is the user’s 
responsibility to understand that when the assumptions are not met, the results are unpredictable. Remember, 
“garbage in... garbage out.” 

The fixed point iteration method presents a problem that the bisection method did not. In the bisection method, 
there was a simple and convenient formula for an upper bound on the error. To provide something similar in the 
fixed point iteration method, one would have to sacrifice simplicity or convenience or both, but the benefits do 
not outweigh the sacrifice. Instead, a more general stopping criterion is used. When two consecutive iterations are 
closer to one another than a given tolerance, the method stops. At this point, the difference between iterations, 
say Xk and Xk+i, is smaller than the tolerance. For a sequence derived from fixed point iteration, Xk+i = f(xk), so 
|ccfc + i — ajfc| = | f(xk) — Xk\. When \xk+i — Xk\ is small, | f{xk) — x^\ is small, so f(xk) ~ xj~ . Xk is “almost” a fixed 
point. 


Assumptions: / is differentiable. / has a fixed point x. Xq is in a neighborhood (x — 6,x + 6) where the 
magnitude of /' is less than one. 

Input: Initial value xq] function /; desired accuracy tol ; maximum number of iterations N. 

Step 1 : For j = 1 ... N do Steps 2-4: 

Step 2: Set x = f(x o); 

Step 3 : If | a; — xq\ < tol then return x\ 

Step 4: Set Xq = x; 

Step 5: Print “Method failed. Maximum iterations exceeded.” 

Output: Approximation x near exact fixed point, or message of failure. 

Key Concepts 

Fixed point: xq is a fixed point of the function f(x) if f(x o) = xq. 

Fixed point iteration: Calculating the sequence £ 0 , 2:1 = f(xo),X 2 = /( X\),xz = /( X 2 ), ■ ■ ■ given the function / 
and initial value 2 : 0 . 

Attractive fixed point: A fixed point is called attractive (or attracting) if there is a neighborhood of the fixed 
point in which fixed point iteration converges for all initial values in the neighborhood. 

Repulsive fixed point: A fixed point is called repulsive (or repelling) if fixed point iteration escapes some neigh- 
borhood of the fixed point for any initial value in the neighborhood other than the fixed point itself. 

Mean Value Theorem: If / is continuous on [a, b] and has a derivative on (a, 6), then there exists c £ (a, b) such 
that /'(c) = ^ M . 

Fixed Point Convergence Theorem: Given a function f(x) with continuous first derivative and fixed point x, 
if | f'{ck)\ < 1 then there exists a neighborhood of x in which fixed point iteration converges to the fixed point 
for any initial value in the neighborhood. 

Exercises 

1. Write an Octave implementation of the fixed point it 
eration method. Save it as a .m file for future use. 

2. (i) Decide whether or not the hypotheses of the Mean (a) f(x) = 3 — x — sin®; [2,3] 


Value Theorem are met for the function over the inter- 
val. (ii) If the hypotheses are met, find a value c as 
guaranteed by the theorem. 
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(b) 

9(x) 

= 3a; 4 - 

- 2x 3 — 3a; + 2; [0, 

1] 

(c) 

g(x) 

= 3a: 4 - 

- 2a; 3 — 3a; + 2; [0, 

0.9] [s) 

(d) 

hix) 

= 10- 

cosh(a;); [-3,-2] 

[A] 

(e) 

f(t) 

+ 

II 

5sint — 2.5; [—6, 

-5] 

(f) 

g(t) 

- [20,23] [s) 


(g) 

hit) 

= ln(3 sin t) - f ; [2,4] [A1 


GO 

f( r ) 

g S ^ n r 

- r; [-20, 20] [A] 


(i) 

g(r) 

= sin(e’ 

l+v, [-3,3] 


(j) 

hir) 

2 sm r 

-3 cosr -, [1,3] 



3. Find the fixed point (s) of the function exactly. Use 
algebra. 

(a) f(x ) = \/2x 3 — x 2 — x 

(b) f{x) = ^ 

(c) fix) = log(x 2 — 3a;) — 1 + x ^ 

(d) gix) = 3x 2 + 5x + 1 ^ 

(e) gix)mx + T ^ 3 T- 2500 

(f) g(x) = e ln U+i)-3 

(g) h(x) = y/4x 2 + 4x + 1 

(h) hix) = x - 10 + 3 X + 25 ■ 3“ x [S1 

(i) hix) = x + 6 — 31og 5 (2a;) 

4. Find at least two candidate functions, fiix) and f 2 ix), 
for finding roots of gix) via fixed point iteration. In 
other words, convert the problem of finding a root of g 
into a problem of finding a fixed point of fi or f^. 

(a) gix) = 7x 2 + 5x — 9 

(b) gix) = x + cos x 

(c) gix) = 6x 5 + 12a; 2 — 8 ^ 

(d) gix) =x 2 - e 3x+4 [S1 

(e) g(x) = 7x — 3 cos(7ra: — 2) + In |2a; 2 + 4a; — 8| 

(f) gix) = 3 x2 ~ 5x+1 - 2~ x2 ~ 5x - 1 [a > 

5. Compute the first 5 iterations of the fixed point itera- 
tion method using the given function and initial value. 
Based on these 5 iterations, do you expect the method 
to converge? 

(a) fix) = 3 — sin a;; a:o = 2 

(b) gix) = 10 + x — cosh(a;); a;o = —3 ^ 

(c) hit) = ln(3sinf) + y; to = 1 ^ 

(d) w(r) = 2 ainr - 3 cosr + r; r 0 = 1 

6. Use your Octave function from question 1 with the 
function and initial value in question 5. Set the tol- 
erance to 10 -10 and the maximum iterations to 100. 
Does the method converge within 100 iterations? If so, 
to what value? Report at least 10 significant digits. 

[S] [A] 

7. Construct a web diagram for each function/initial value 
pair in question 5. bD-. 

8. Compare the results from question 6 with the results 
of question 7. Are they consistent with one another? 


9. Use proposition 3 to show that gix) = 2a;(l — x) has a 
unique fixed point on [0.3, 0.7]. 

10. Let fix) = ^. [S1 

(a) Show that / has a unique fixed point on 
[-4, -0.9]. 

(b) Use fixed point iteration to find an approximation 
to the fixed point that is accurate to within 10~ 2 . 

11. Let gix) = ir + 0.5 sin(a:/2). 

(a) Show that g has a unique fixed point on [0, 27r]. 

(b) Use fixed point iteration to find an approximation 
to the fixed point that is accurate to within 10~ 2 . 

12. Show that the fixed point iteration method applied to 
fix) = ^8 — Ax will converge to a root of gix) = 
x 3 + Ax — 8 for any initial value xo G [1.2, 1.5]. ^ 

13. Show that fixed point iteration is guaranteed to con- 
verge to the fixed point of 

fix) = (V2) x 

for any x 0 G [1,3]. HINT: /'( x) = § ln(2) ■ (V2)U 

14. Let gix) = x 2 — 3x — 2. 

(a) Find a function / on which fixed point iteration 
will converge to a root of g. 

(b) Use your function to find a root of g to within 
1CU 3 of the exact value. 

(c) State the initial value you used and how many 
iterations it took to get the approximation. 

15. Use fixed point iteration with po = —1 to approximate 
a root of gfx) = x 3 — 3* + 3 accurate to the nearest 
10 “ 4 . 

16. Use a fixed point iteration method to find an approx- 
imation of V3 that is accurate to within lCT 4 . What 
function and initial value did you use? 

17. The function fix) = x 4 + 2ar — x — 3 has two roots. 
One of them is in [—1,0] and the other is in [1,2]. 

(a) In preparation for finding a root of fix) using 
fixed point iteration, one way to manipulate the 
equation x 4 + 2x 2 — x — 3 = 0 is to add x to both 
sides. This gives 

x = x 4 + 2x 2 — 3 

Draw appropriate graphs to determine whether it- 
eration of the function gix) = x 4 +2x 2 — 3 will find 
the root in [—1,0]. How about the root in [1,2]? 
Explain how you came to your conclusions. 

(b) Manipulate the equation x 4 + 2a; 2 — x — 3 = 0 in 
such a way that fixed point iteration does work 
to find the root in [—1, 0]. Draw the graphs that 
demonstrate that your method will work. 

(c) Does the same manipulation allow you to find the 
root in [1,2]? If not, find another manipulation 
that will. Again, show the graphs that demon- 
strate that your method will work. 

(d) Use your method(s) from parts 17b and 17c to 
find the two roots accurate to 3 decimal places. 
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18. Fixed point iteration on /(*) = \j2x z — x 2 — x will 
not converge to a fixed point. However, fixed point 
iteration on the function g(x) = \J x 1 + x will con- 
verge to approximately 1.618033988749895 for any *o 
in [0.5, 3.5]. [A] 

(a) How many iterations does it take to achieve 1CU 4 
accuracy using g(x) with xo = 2.5? 

(b) Explain why /(*) and g(x) have the same fixed 
points. 

19. Find a zero (any zero) of g(x) = x 2 + 10 cos a; accurate 
to within 10~ 4 using fixed point iteration. State 

(a) the function / to which you fixed point iteration 

(b) the initial value, xo, you used 

(c) how many iterations it took 

20. Let c be a nonzero real number. Argue that any fixed 
point of f(x) = xe c ' 9 ^ is a root of g. 

21. Approximate \/3 using the method suggested by ques- 
tion 20. 

22. Suppose g(x) = 0 and g has a continuous first deriva- 
tive. Argue that there exists a value c for which fixed 
point iteration on /(*) = x + cg(x) will converge to * 
on some neighborhood of *. 

23. Find a value of c for which fixed point iteration is guar- 
anteed to converge for the function /(*) = x + c(x — 

5 cos*) with any initial value xo € [0,7t/2]. Explain. 

[A] 


24. Let g(x) =\* + | x - 1(T 5 . 

(a) Show that if g(x) has a zero at p, then the func- 
tion /(*) = * + cg(x ) has a fixed point at p. 

(b) Find a value of c for which fixed point iteration 
of /(*) will successfully converge for any start- 
ing value, po, in the interval [16, 17]. Sketch the 
graphs that demonstrate that your choice of c is 
appropriate. 

(c) Use the function from part 24b with the value of 
c you have determined to find a root of g(x) ac- 
curate to within 10~ 4 . State the value you used 
for po. Show the last 3 iterations. How many 
iterations did it take? 

25. Prove that for /(*) = cos*, fixed point iteration con- 
verges for any initial value. 

26. The Fixed Point Convergence Theorem can be 
strengthened. The requirement that the first deriva- 
tive be continuous can be replaced. Modify the proof 
in the text to show the following claim. 

Given a differentiable function /(*) with fixed point x, 
if |/'(*)| < M < 1 for all x in some neighborhood of 
x, then fixed point iteration converges to the fixed point 
for any initial value in the neighborhood. 

27. Create three graphs similar to those in Figure 2.2.4 to 
analyze the situation when the derivative at the fixed 
point equals —1. Does the situation differ from that 
when the derivative at the fixed point equals 1? 


Answers 


Figure 2.2.4: From left to right: every neighborhood of the fixed point will have both types of initial values; 
point iteration converges for all values in a neighborhood of the fixed point; fixed point iteration escapes some 
neighborhood of the fixed point for all initial values in the neighborhood except the fixed point itself 

Figure 2.2.6: When its denominator is zero, fe{x) will be undefined (there is a vertical asymptote in the graph), 
so we solve 3* 2 — 10* + 4 = 0 to find two initial values for which fixed point iteration will fail (since the 
first iteration will be undefined). They are x = 5± .) /4 ^ ss .4648 and 2.868. To find a third point for which 
fixed point iteration will fail, we solve the equation / 6 (* ) = (we could just as easily have solved 

/ 6 (* ) = 5 ~^ /4 ^ instead). Then the second iteration will be undefined since the first iteration will be 5+vTA 
The only real solution is approximately 1.055909763230534, which can be found by fixed point iteration on 


Prove it. Note, though, the claim that fixed point iteration will fail is 
based on the assumption of exact arithmetic. The fact that any reasonable implementation of the fixed point 
iteration method will involve floating point arithmetic might provide just enough error for the method to 
converge even for these initial values. 


56 


CHAPTER 2. ROOT FINDING 


2.3 Order of Convergence for Fixed Point Iteration 

Suppose / is a function with fixed point x and f'(x) exists. Let Xq, aq, X 2 , ■ • . be a sequence derived from fixed 
point iteration (xk+i — f{xk) for all k > 1) such that lim Xk = x and Xk ^ x for all k = 0, 1, 2, . . .. Then 

k—¥oo 


\x n +\ - x\ _ f(x n ) - f(x) 
\x n ~ X I 1 X n - X 


and 


lim 

n—too 


f(Xn) ~ f{x) 
X n - X 




(2.3.1) 


Therefore, fixed point iteration is linearly convergent as long as }'{x) ^ 0. The following proposition could be 
presented as a corollary to the Fixed Point Convergence Theorem since much of the argument simply repeats what 
was noted there, but we choose to present it as a separate claim based on equation 2.3.1. To be more precise, we 
have the following result. 


Proposition 5. (Fixed Point Error Bound) Let f be a differentiable function with fixed point x and let [a, b] be an 
interval containing x. If \f'(x)\ < M < 1 for all x £ [a, b } and /([a, &]) C [a, b], then for any initial value Xq £ [a, b\, 
fixed point iteration, with Xk+i = f(xk ) for all k > 0, gives an approximation of x with absolute error no more than 
M k \x 0 -x\. 

Proof. An elementary induction proof (requested in the exercises) will establish that Xk £ [a, b] for all k > 0. We 
proceed to prove the error bound. The absolute error in approximating x by Xo is |a?o — x\ = M°\xq — x\ so the 
claim is true for k = 0. Now suppose the claim is true for some particular but arbitrary k > 0. By the Mean Value 
Theorem, there is a c in the interval from x to Xk such that f(c) = Since x and Xk are both in [a, b], so 

is c. It follows that |/'(c)| < M, so \f{xk) — f{x) \ < M\xk — x\. But x is a fixed point of /, so f(x) = x , from which 
it follows that \f(xk) — x\ < M\xk — £|, and, therefore, that \xk+i — x\ < M\xk — x\. By the inductive hypothesis, 
\xk — x\ < M k \x 0 — x\, so \xk+\ — x\< M ■ M k \x 0 — x\ = M k+1 \x 0 — x\. □ 


When f'(x) = 0, equation 2.3.1 shows that fixed point iteration does not converge linearly. For any sequence 
( p n ) converging to p, if linin^oo = 0 we say the sequence is superlinearly convergent or that convergence is 

faster than linear. 

Consider the functions f(x) = |a; 3 — x 2 +2x+l and f\(x) = —x 3 +5x 2 — 3x—6 from section 2.2. Recall 2 is a fixed 
point of / and 3 is a fixed point of f± and observe that /'( 2) = | -2 2 — 2-2 + 2 = and /((3) = — 3-3 2 + 10-3 — 3 = 0 

Consequently, we should expect fixed point iteration of /i to converge to 3 faster than that of / converges to 2. With 
s 0 ,si,s 2 ,... = 1-75, /(l. 75), /(/(l. 75)), . . . and - = 2.75, /i(2.75), / 1 (/ 1 (2.75)) table 2.1 shows the 


Table 2.1: Comparing order of convergence for fixed point iteration when the derivative at the fixed point is not 
zero (s n ) to that when the derivative at the fixed point is zero ( t n ). 


n 

2 - S n | 

3 — t n \ 

0 

2.5(10)- i 

2.5(10)- i 

1 

1.074(10)- 1 

2.343(10)- 1 

2 

5.644(10)" 2 

2.068(10)- 1 

3 

2.740(10)~ 2 

1.623(10)- 1 

4 

1.388(10)- 2 

i.oio(io)- 1 

5 

6.894(10)- 3 

3.984(10)- 2 

6 

3.459(10)- 3 

6.286(10)- 3 

7 

1.726(10)- 3 

1.578(10)- 4 

8 

8.640(10)" 4 

9.966(10) -8 

9 

4.318(10)" 4 

3.973(10)- 14 

10 

2.159(10)" 4 

6.317(10)- 27 


relative speeds of convergence. ( s n ) is converging linearly as expected, and ( t n ) seems to be converging quadratically. 
The last four exponents in the |3 — t n \ column are —4, —8, —14, —27, indicating that the number of significant digits 
of accuracy is approximately doubling with each iteration. In other words, the error of one term is roughly the 
square of the previous error (meaning a = 2 in the definition of order of convergence). 
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Table 2.2: Accelerating the convergence of a linearly converging sequence. 


n 

Cn 

dn 

I c n - c\ 

I a n - c\ 

a n — c 

kn+l-c | 

C n — C 

|a„-c| 2 

0 

l 

.728010 

2.609(10)- i 

1.107(10)” 2 

.0934 

.0110 

1 

.5403 

.733665 

l^tio)- 1 

5.419(10)- 3 

.0639 

44.19 

2 

.8575 

.736906 

1.184(10)" 1 

2.178(10)- 3 

.0400 

74.17 

3 

.6542 

.738050 

8.479(10)- 2 

1.034(10)- 3 

.0274 

217.9 

4 

.7934 

.738636 

5.439(10)- 2 

4.490(10)“ 4 

.0180 

419.4 

5 

.7103 

.738876 

3.771(10)- 2 

2.085(10)“ 4 

.0122 

1034 

6 

.7639 

.738992 

2.487(10)- 2 

9.289(10)- 5 

.0081 


7 

.7221 






8 

.7504 







Taylor’s theorem will provide the proof we need that this convergence really is quadratic. Suppose / has a 
third derivative in a neighborhood of x. Define e n = x — x n . Then according to Taylor’s theorem, x = f(x) = 
f(x n + e n ) = f(x n ) + e n f'(x n ) + \e 2 n f"(x n ) + 0(e 3 ). But f (x n ) = x n+1 so we get 

x - x n+ i = e n+1 = e„f'(x n ) + ]^e 2 n f"{xn) + 0(e 3 ). (2.3.2) 

Also from Taylor’s theorem, f(x) = f'(x n + e n ) = f'(x n ) + e n f"(x n ) + 0(e 2 ). But f(x) = 0 so 

/'(*„) = —e n f"(x n ) - 0(e 2 J. (2.3.3) 


Substituting 2.3.3 into 2.3.2, 

e n +i = e„(— e n f"(x n ) - O(e^)) + - e^/"(x n ) + O(e^) 

= ~^ e lf"( x n) + 0(el). 

Hence, = ~\f"{x n ) + 0{e n ) and 

’ 

showing that convergence is at least quadratic. If f"(x) happens to be 0, then the convergence is superquadratic. 

To summarize, on the off-chance that, at a fixed point x, f'(x) = 0, fixed point iteration is successful and fast 
for initial values near x. But when f'(x) ^ 0, fixed point iteration may fail to converge to x, and when it does 
converge, the convergence is slow. There is a quick fix (quick to implement, not quick to explain) for some of this 
deficiency when f 1 (x) ^ 0, however. We will first concentrate on the speed of convergence. 

Let the sequence (c n ) be defined by 


hm = lim 


n—*oo \x — Xn 


2 /"(*") + 0(en) 


Co = 1 

Cfc = cos(cfe-i), k > 0. 

You should be able to verify that the first few terms of this sequence are (approximately) 

1, .5403, .8575, .6542, .7934, . . . 


This is exactly the sequence you created in the calculator experiment on page 46 of section 2.2. Define a new 
sequence (a n ) by 

(Cn+l C n ) 

dfi , — Cfi — • 

Cn+2 2c n _|_i + C n 


Table 2.2 shows the first few terms of each sequence along with some error analysis. As promised, the sequence 


(a n ) is converging more quickly than (c n ), evidenced by the fact that 


is tending to zero. The last column of 


the table indicates that the convergence of (a n ) to c is not quadratic, however. 
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More generally, suppose (p n ) is any sequence that converges linearly to p. Then we have lim ^ = \ ^ 0, 

n—> OO \p Pn\ 

so we should expect ^ ~ A for large enough n, from which we get \{p — p n+ 2)(p— p n )\ ~ \p~ p n +i\ 2 ■ 

Assuming p — p n +2 and p — p n have the same sign for large n 4 , we can remove the absolute values to find 


{p - Pn+2){p ~ Pn) 
p 2 - {p n + 2 + Pn)p + Pn+2Pn 
{~Pn + 2 + 2p n+ i - p n )p 

P 


{p-Pn+lf 
p 2 - 2 p n+1 p + pl +1 
Pn+2Pn + Pn+l 
Pn+2Pn Pn+l 
Pn+2 2p n _|_i T p n 


Therefore, we may take any three consecutive terms of ( p n ) and predict p from this formula. For large enough n, 
this prediction will be a much better estimate of p than is p n - But just as we were able to claim \{p— p n + 2 ){p~ Pn)\ ~ 
\p — p ra +i| 2 , it must also be the case that p n + 2 Pn ~ Pn+ i> so the numerator of our approximation is nearly zero. Of 
course, that means the denominator must be nearly zero as well, since the quotient is p , a value that may not be 
zero. To avoid some of the error inherent in this calculation, it is advisable to compute the algebraically equivalent 
approximation 


P 


Pn - 


{.Pn+l - Pn) 2 
Pn+2 2p n _|_i -T p n 


(2.3.4) 


instead. Let’s go back and revisit the sequence (s n ) and apply this approximation. 


Define a n = s n — 


(s„ + l-s„) 

n + 2 — 2-Sn + l+Sr, 


and consider table 2.3 comparing the two sequences (s n ) and {a n 


Table 2.3: Comparing fixed point iteration when the derivative at the fixed point is not zero, s n , to the Aitken’s 
delta-squared sequence, a n . 


n 

Sn 

2 - s n | 

CL n 

|2 - a n | 

0 

1.75 

2.5(10) — 4 

1.99506842493985 

4.931(10)- 3 

1 

2.107421875 

1.074(10)" 1 

1.999022858310434 

9.771(10)- 4 

2 

1.943559146486223 

5.644(10)~ 2 

1.999737171760319 

2.628(10) -4 

3 

2.027401559734717 

2.740(10)" 2 

1.999937151202653 

6.284(10)- 5 

4 

1.986114080555812 

1.388(10)- 2 

1.999983969455146 

1.603(10)- 5 

5 

2.006894420349172 

6.894(10)- 3 



6 

1.996540947531514 

3.459(10)- 3 




converges significantly faster than the linearly convergent sequence from which is was derived, just as before! The 
fact that |2 — a n \ « |2 — s n + 2| 2 is evidence of this claim, but the convergence of (a„) is still linear. Make sure you 
can calculate the a n in this table yourself before reading on. 

On a practical note, there is no sense in calculating all the terms oo, ai, . . . , a n _2 as done in the table. The 
terms of (a„) are dependent only on those of (s n ) so a n - 2 can be calculated just as well without having calculated 
ao,ai, . . . , a n _ 3 . The table shows all of them only for illustrative purposes and so you can get some practice with 
formula 2.3.4. The important thing to notice is that a n has approximately twice as many significant digits of 
accuracy as does s n +2- Consequently, ao is a much better approximation than is S 2 . 


Crumpet 11: Aitken’s delta-squared method is designed for any linearly convergent sequence, not 

just sequences derived from fixed point iteration. 


The derivation of 2.3.4, referred to as Aitken’s delta-squared formula, makes no reference to fixed point iteration. 
In fact it makes no assumptions about the origin of the sequence. It makes no difference. It may be a sequence of 
partial sums, a sequence of partial products, a sequence derived from any recurrence relation, a sequence derived 
from number theory, or anything else. The only important characteristics are that the sequence converges and it 
does so linearly. 


4 This will happen in the common events that the x — x n all have the same sign or the x — x n have alternating signs, so this is not 

an unrealistic assumption. 
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Table 2.4: Steffensen’s method applied to f(x) = cosx. 


n 

a n 

g(a r ,) 

g{g(an)) 

I a n - c| 

|a„+i-c| 


0 

1 

.5403023058681398 

.8575532158463934 

2.609(10)- 1 

.162 

1 

.7280103614676171 

.7464997560452203 

.7340702837365296 

1.107(10)- 2 

.148 

2 

.7390669669086738 

.7390973701357808 

.7390768902228948 

1.816(10)- 5 

.148 

3 

.7390851331660755 

.739085133248225 

.739085133192888 

4.908(10)~ n 

.148 

4 

.7390851332151607 



3.063(10)” 17 



The sum y — | + | — y + | — • • • converges to \ linearly so Aitken’s delta-squared method should be helpful. 
If we let p n = ^ 5e t ^ le nth P ar tial sum, then p 2 = y|, p 3 = p^ = |f|, and p$ = ||||. Aitken’s 

2.6 and 


Tt}__ 13)2 

extrapolation gives a 2 = { § - ato 1 ° 5 2 jra 5 , 13 = xffg and a 3 ** y^ - gsfflr^gf; 

315 ^ 105 ' 15 

If -Psl 


263 76 '2 

315 IQS' 1 

2578 o 263 j 76 1260' |I-a 2 

3465 z 315 + 105 1 4 1 ' 


If -“3 I 


315 105 V 15 3465 ^31 

3.5 so extrapolation gives an error less than the square of the error in the original sequence. 


Perhaps this fact gives you an idea. Once s 2 is calculated, we can use equation 2.3.4, also known as Aitken’s 
delta-squared method, to calculate a better approximation than we already have. And once we have this good 
approximation, it seems a bit silly to cast it aside and continue computing S 3 = /(S 2 ), S 4 = /(S 3 ), and so on. What 
if we use a 0 in place of S 3 in our iteration? In other words, we would have Si = /(so), s 2 = /(s 1 ), S 3 = a 0 , S 4 = /(S 3 ), 
and so on. That should improve S 3 ,S 4 , and S 5 . And once we have S 5 we again have three consecutive fixed point 
iterations, so we can apply Aitken’s delta squared method again. Instead of calculating sq = f(ss), we can get what 
should be a better approximation by using equation 2.3.4 on S 3 ,S 4 , and S 5 . In other words, = 03 , S 7 = f(sg), 
s 8 = /(s 7 )- Again, we have three consecutive fixed point iterations, so sg = ag, and so on. This gives the sequence 


1.75, 

1.995068424939850, 

1.999997974970982, 

1.999999999999658, 

1.999999999999999, 


2.107421875, 

2.002459692429676, 

2.000001012513483, 

2.000000000000170, 


1.943559146486222, 

1.998768643123618, 

1.999999493743001, 

1.999999999999914, 


which converges to 2 very quickly compared to (s n ). If we consider the calculations of si, S2, S4, S5, S7, s§, . . . to be 
intermediary and focus on the subsequence sg, S3, sg, Sg, . . . = sg, ag, a 3 , ag, . . . as a sequence itself we have 


1.75, 1.995068424939850, 1.999997974970982, 1.999999999999658, 1.999999999999999,... 


which converges very rapidly! The construction of this subsequence as a sequence in and of itself is called Steffensen’s 
method and the convergence is quadratic as long as (s n ) is convergent. The following is a heuristic argument that 
Steffensen’s method gives quadratic convergence. As seen, the error in s 2 is not significantly different from the error 
in so- But ao has an error approximately equal to the square of the error in s 2 , so the error in ao is approximately 
the square of the error in so- Similarly, the error in S 5 is not significantly different from that in ag = S3. But the 
error in ai is approximately the square of the error in S 5 , so the error in ai is approximately the square of the error 
in ao- Similarly, the error in a „+ 1 is approximately the square of the error in a n . 

Applying Steffensen’s method to the function f{x) = cos x with Xg = 1, we can accelerate the convergence of the 
sequence (c n ) dramatically. Table 2.4 shows the first few terms of (a n ) with some error analysis. The last column 
of the table indicates that 


lim K-h - C| 
| a n - c|" 


.148 


and, consequently, that the sequence (a n ) converges quadratically. 

Finally, we have two ways to get quick convergence from fixed point iteration. One, we simply iterate when the 
function has derivative zero at the fixed point. Two, we use Steffensen’s method. 


60 


CHAPTER 2. ROOT FINDING 


Figure 2.3.1: Convergence diagrams for 5 functions with the same fixed points — Steffensen’s method. 


fi- 


h- 


h- 


U 


f 5 : - 2 - 1 0 1 2 3 

black: does not converge; green: converges to 3; red: converges to 1 + blue: converges to 1 — \/3 

Convergence Diagrams 

Speeding up fixed point iteration only takes care of one deficiency of the method. There is still the problem of diver- 
gence from fixed points where the derivative of the function has magnitude equal to or greater than 1. Steffensen’s 
method helps. Compare Figure 2.3.1 with Figure 2.2.6. The convergence diagrams for Steffensen’s method show 
convergence over larger intervals of initial values. Moreover, where /i and f 2 are concerned, Steffensen’s method 
finds all three fixed points, just as fixed point iteration on / 6 did. 


Steffensen’s Method (pseudo-code) 

Since Steffensen’s method is particularly prone to floating-point error, we do a preliminary check for convergence 
before the Aitken’s delta-squared step. This helps prevent large errors or division by zero in Step 4. 


Assumptions: Fixed point iteration converges to a fixed point of / with initial value x 0 . 
Input: Initial value x 0 ; function /; desired accuracy tol ; maximum number of iterations N. 
Step 1: For j = 1 ... N do Steps 2-6: 


Step 2: 
Step 3: 
Step 4: 
Step 5: 
Step 6: 


Set x i = f(x o); x 2 = f{x i) 

If |a ;2 — X\ | < tol then return x 2 


«pt T — r _ __ Ga-^o) 
set x x 0 X2 - 2xi+Xo 

If \x — Xq| <tol then return x\ 


Set Xq = x: 


Step 7: Print “Method failed. Maximum iterations exceeded.” 

Output: Approximation x near exact fixed point, or message of failure. 


Key Concepts 

Aitken’s delta-squared method: If (jp n ) converges to p linearly, the sequence (a n ) defined by a n = p n — 

( — I 2 

p +?— 2 p i+p converges to p superlinearly. 

Fixed Point Error Bound: Let / be a differentiable function with fixed point x and let [a, b ] be an interval 
containing x. If \f'(x)\ < M < 1 for all x £ [a, b] and f([a, b]) C [a, b\, then for any initial value x 0 £ [a, b], 
fixed point iteration, with Xk+i = f{xu) for all k > 0, gives an approximation of x with absolute error no 
more than M k \xo — x\. 
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Fixed Point Iteration Order of Convergence: Suppose / is a function with fixed point x and fix) exists. 
Let Xo, Xi, X 2 , ■ ■ ■ be a sequence derived from fixed point iteration ( Xk+i = f(xk) for all k > 1 ) such that 
lim Xk = x and Xk f x for all k = 0, 1, 2, . . .. Then the sequence ( x n ) converges linearly to x if fix) f 0 and 

k —> 00 

at least quadratically if f'(x) = 0. 


Steffensen’s method: A modification of fixed point iteration where every third term is calculated using Aitken’s 
delta-squared method. 


Superlinear convergence: If the sequence poj Pi >P 2 , converges top and lim 

k — yoo 

is said to converge superlinearly. 


\Pk+i ~ P I 
I Pk - p\ 


0, then the sequence 


Superquadratic convergence: If the sequence po,pi,p 2 , . ■ ■ converges to p and 
sequence is said to converge superquadratically. 


lim *f fc+1 = 0, then the 
| Pk~P\ 


Octave 

In section 1.3, we learned about for loops. With a for loop, you have to know how many times you want the loop 
to run or at least you need a maximum. You can quit a for loop before it is done by exiting (returning) from 
the function. There are times, however, when you don’t know how many times you need a loop to run and you 
don’t even have a convenient maximum at hand. In this case, a while loop is more appropriate. A while loop will 
continue to loop as long as a certain condition is met, and you set the condition. The syntax for a while loop is 

while (condition) 
do something. 
end%while 

but must be used with caution, for loops always have an end, but while loops do not if programmed carelessly. If 
the condition of a while loop is never met, the loop runs indefinitely! Here is a simple example of a while loop 
that never ends. Do not run it! 

i=0; 

while (i<12) 

disp("Help! I’m stuck in a never-ending loop!!") 
end%while 

The problem is i is set less than 12 and never changes so always remains less than 12. Thus the condition of this 
while loop is always met. This loop can easily be modified to terminate. If we increment i inside the loop, it will 
end. This modification of the never-ending loop does end and displays a messge 12 times: 

i=0; 

while (i<12) 

dispC'That’s better. I can handle a dozen iterations.") 
i=i+l ; 
end%while 

Incidentally, any for loop can be replaced by a while loop like this one. 

We are human. Inevitably, we will program a while loop that never ends. What to do once it starts running? 
Of course, you can power down the machine, but that is a little like bringing your coffee mug to the kitchen using a 
bull dozer. There is an easier way. You can simply stop the application in which you are running Octave. If you are 
using a command line (terminal) window or the Octave GUI, you can simply close it. But, if you remember, you 
can also press Ctrl-c. That is, tap the c key while holding down the Ctrl key. This will interrupt the never-ending 
loop. 

For a more practical example, the bisection method can easily be re-programmed using a while loop. First, the 
pseudo-code: 

Assumptions: / is continuous on [a, b\. f(a) and f(b) have opposite signs. 

Input: Interval [a, &]; function /; desired accuracy tol. 

Step 1: Set m = g ^; err = |2> — a|/2; L = f(a ); 
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Step 2: While err > tol do Steps 3-5: 

Step 3: Set m = 9 ^- 1 M = f(m ); err = err/2; 

Step 4: If M = 0 then return m; 

Step 5: If LM < 0 then set b = m; else set a = m and L = M : 

Step 6: Return m. 

Output: Approximation m within tol of exact root. 

Now the Octave code. If you decide to use this code, it should be saved in a file named bisectionWhile ,m. 

function p = bisectionWhile (f ,a,b,tol) 
p = a + (b-a)/2; 
err = abs(b-a); 

FA = f (a) ; 
while (err>tol) 
p = a + (b-a)/2; 

FP = f(p); 
err=err/2 ; 
if (FP == 0) 
return 
end'/oif 

if (FA*FP > 0) 
a = p; 

FA = FP; 
else 
b = p; 
end'/oif 

end%while 
end%f unction 

Use this code with caution! It can run as a never-ending loop! If the function is called with a negative value for tol, 
as in bisectionWhile (g, 1 , 2, -10), it will run until forcibly stopped (using Ctrl-c or shutting down the Octave 
app) as err will always be greater than —10. 

Error checking 

The most useful software includes error checking. In the case of the bisectionWhile function, we want to avoid 
the endless loop in every instance we can imagine. Adding a couple lines at the beginning of the function provides 
some security: 

function p = bisectionWhile (f ,a,b,tol) 
if (tol<=0) 

p = "ERROR: tol must be positive."; 
return 
end%if 

p = a + (b-a)/2; 
err = abs(b-a); 

FA = f (a) ; 
while (err>tol) 
p = a + (b-a)/2; 

FP = f(p); 
err=err/2 ; 
if (FP == 0) 
return 

end'/oif 

if (FA*FP > 0) 
a = p; 

FA = FP; 
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else 
b = p; 
end"/oif 
end%while 
end'/of unction 

In general, having your program check for input errors like this is called error checking or validation . Most 
of the time, we will write code assuming the input is valid and will not do any error checking. This makes the 
programming simpler, but also allows for problems like never-ending loops! bisectionWhile .m may be downloaded 
at the companion website. 


Exercises 

1. Supply the proof that Xk € [a, 6] for all k > 0 in propo- 
sition 5. 


2. Show that 


Pn+2Pn Pn + 1 
Pn+2 ‘^‘Pn-\-l Pn 


and 


Pn ~ 


(Pn+1 Pn) 


Pn+2 ~ 2p n + l + Pn 
are algebraically equivalent. 


3. Write an Octave function that implements Steffensen’s 
method. 

4. * o Write an Octave program (. m file) that uses a while 
loop and the dispO command to output the first 10 
powers of 5 starting with 5°. 

5. * ■ Write an Octave program ( .m file) that uses a while 
loop, an array, and the dispO command to find the 

o 2 ™ _ o 

values of f(n) = ^ on ^ for n = 0, 1, 2, 4, 6, 10. ^ 

6. m o Write an Octave program (.m file) that uses 
a while loop, an array, and the dispO command 

2n 

to find the values of f(n)= — . for n = 

s/n 2 + 3 n 

0,2,5,10,100,1000,20000. 

7. * • The following Octave code is intended to calculate 
the sum 

f I 

^ k 2 

k = 1 

but it does not. Find as many mistakes in the code as 
you can. Classify each mistake as either a compilation 
error (an error that will prevent the program from run- 
ning at all) or a bug (an error that will not prevent the 
program from running, but will cause improper calcu- 
lation of the sum). 


sum=l ; 
k=l; 

while k<30 

sum=sum+l . 0/k*k; 
end 

diss(sum) 


(c) 12, 12.333, 12.667, 13, 13.333, 13.667, 14 

(d) 1,9, 25, 49, 81, 121, 169, 225, 289, 361, 441 

(e) 1, .5, .25, .125, .0625, .03125, .015625 

9. The function g(x) = \/5 — 3x satisfies the hypotheses 
of proposition 5 over the interval [1, 1.3]. Find a bound 
on the number of iterations required to find the fixed 
point to within 10 -5 accuracy starting with initial value 
xo of your choice. 

10. Fixed point iteration on the function g( x) = \/x 2 + x 
will converge to approximately 1.618033988749895 for 
any xq in [0.5, 3.5]. ! A ! 

(a) Find a bound on the number of iterations it will 
take to achieve 10~ 4 accuracy with xo = 2.5. 

(b) How many iterations does it actually take to 
achieve 10 -4 accuracy with xo = 2.5? 

11. Let f(x) = 'g(„ T* ■ In exercise 10 of section 2.2, you 
were asked to show that / has a unique fixed point on 
[-4, -0.9]. [s) 

(a) Find a bound on the number of iterations required 
to approximate the fixed point to with 10~ 14 ac- 
curacy using fixed point iteration with any initial 
value in [—4, —0.9]. 

(b) Use fixed point iteration with xo = —4 to find an 
approximation to the fixed point that is accurate 
to within 10 -11 . The fixed point is x = —1. 

(c) Compare the bound to the actual number of iter- 
ations needed. 

12. Let g(x) = n + 0.5 sin(o:/2). In exercise 11 of section 
2.2, you were asked to show that g has a unique fixed 
point on [0, 27 t] . 

(a) Find a bound on the number of iterations required 
to achieve 10~ 2 accuracy using fixed point itera- 
tion with any initial value in [0, 2 tt], 

(b) Use fixed-point iteration with xo = 0 to find an 
approximation to the fixed point that is accurate 
to within 10~ 2 . The fixed point is x =???. 

(c) Compare the bound to the actual number of iter- 
ations needed. 


O l 3 
Write a while loop that outputs the sequence of 

numbers. 

14. 

(a) 7,8,9,10,11,12,13,14,15 
(b) 20,19,18,17,16,15,14,13 


Calculate two iterations of Steffensen’s method for 
g{x) = \^x 2 + x with xo = 2.5. ^ 

Use Steffensen’s method to find the root of g[x) = 
x 4 — 2a: 3 — 4x 2 + 4a: + 4 in [2, 3] accurate to five siginif- 
icant digits. [A) 


64 


CHAPTER 2. ROOT FINDING 


15. Compute ao,ai, and a ,2 of Aitken’s delta-squared 
method for the sequence in problem 2 on page 27. 
Since the sequence has an undefined term at n = 1, 
start the sequence (^rj) with n = 2. In other words, 
consider the sequence in problem 2 on page 27 to be 
3, 2, § , | , | . . . so po = 3, pi = 2, p 2 = | , and so on. 

16. The following sequences are linearly convergent. Gen- 
erate the first five terms of the sequence ( a n ) using 
Aitken’s delta-squared calculation. 

(a) po = 0.5, p„ = (2 - e Pn - 1 + p1_ 1 )/3 for n > 1 M 

(b) po = 0.75, p„ = / 3 for n > 1 

17. Use Aitken’s delta squared method to find p = lim p n 

n—> oo 

accurate to 3 decimal places. 

p n = {-2, -1.85271, -1.74274, -1.66045, 

- 1.59884, -1.55266, -1.51804, 

- 1.49208, -1.47261, . . .} 

18. The sequence (a n ) of question 15 converges faster than 
does the sequence in problem 2 on page 27. If you 
were to apply Aitken’s delta-squared method to the se- 
quence ( a n ), would you expect the convergence to be 
even faster? Explain. 

19. Recall from calculus that linin^oo n sin (A ) = 1. 

Therefore, if we let p n = nsin (^), then the sequence 


(pi,P 2 ,P 3 , ■ . ■) « (.84147, .95885, .98158, .. .} converges 
to 1, albeit very slowly. Generate the first three terms 
of the sequence ( a n ) using Aitken’s delta-squared cal- 
culation. Does it seem to be approaching 1 faster than 
does (pn)? 

20. Fixed point iteration applied to f(x) = sin(ir) with 
*0 = 1 takes 29, 992 iterations to reach a number be- 
low 0.01 on its way to the fixed point 0. Incidentally, 
*29992 ~ 0.099999. How many iterations does it take 
Steffensen’s method with xo = 1 to reach a number 
below 0.01? Comment. G 

21. Let /(*) = 1 + (sin*) 2 and po = 1. Find an and a 2 of 
Steffensen’s method with a calculator. 

22. Compute the first three iterations of Steffensen’s 
method applied to g(x) = ( y/2) x using po = 3. 

23. Steffensen’s method is applied to a function /(*) using 
po = 1. If /(/(po)) = 3 and an = 0.75, what is /(po)? 

[A] 

24. Find the fixed point of /(*) = * — 0.002(e x cos(*) — 100) 
in [5, 6] using Steffensen’s method. ^ 

25. In question 24 you found a fixed point *. For what 
function g(x ) is * a root? 

26. " o Write a while loop that outputs the numbers 
1, .5, .25, .125, .0625, .03125, .015625, . . . until it reaches 
a number below 10~ 4 . 
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2.4 Newton’s Method 


In section 2.3 we addressed some of the deficiency in fixed point iteration, but delayed deep discussion of the 
mysterious function of the root finding investigation on page 52. The time has come to discuss fe in some detail. 
We start with some number crunching. Recall that fe(x) = and let Xq = 4. Proceeding with fixed point 

iteration, 


x\ = f%{x o) = 
X2 = fe{x i) « 
X3 = U{x 2 ) ~ 

X 4 = fe{x 3) ~ 
X 5 = h{X4) ~ 

X 6 = fe{x 5 ) « 

X7 = fe{x 6 ) « 

You can see two things. The sequence xq,X\,X 2 , ■ ■ ■ 

1. is converging to (the fixed point) 3; and 


3.5 

3.217391304347826 

3.072749058541597 

3.013730618589344 

3.000683798275568 

3.000001860777997 

3.000000000013848. 


2. it looks like the convergence is quadratic since, starting with X 4 to X 5 , the number of significant digits is 
roughly doubling with each iteration. 


In the analysis in section 2.3 on page 56, we found that fixed point iteration converges quadraticly (or better) only 
when the derivative at the fixed point is zero. These observations should lead you to believe /g(3) = 0. Let’s check. 
First, the derivative fg(x) = 6x ~^ 2 -Wx+ 4 y X ~ 6 ° (y° u should verify this). Evaluating the numerator at the fixed 
point, x = 3, we get 6(3) 4 — 40(3) 3 + 74(3) 2 — 4(3) — 60 = 486 — 1080 + 666 — 12 — 60 = 0. So we have convergence 
to a fixed point where the derivative of the function is zero, and we indeed have that convergence is quadratic. 

Starting with Xq = 2, fixed point iteration on converges to 1 + -\/3, and starting with Xo = —1, fixed point 
iteration converges to 1 — \^3. You should be able to verify this from the convergence diagram in Figure 2.2.6 or 
from calculating the first several iterations for each yourself. What you do not get from the convergence diagram 
is the speed of convergence. For that, you need to look at the iterates. You should do so. Does convergence look 
quadratic in these cases too? Answer on page 72. 

From the convergence diagram, we see that fixed point iteration will converge for virtually any initial value, 
and all three fixed points can be estimated by fixed point iteration. Moreover, from our calculations, it looks like 
convergence is quadratic for all three. It’s hard to ask for more from a function. Fast convergence to any fixed 
point! So whence did come? 

Suppose g{ x) is differentiable and g(x) = 0 so g has a root at x. Consider f(x) = x — . x is a fixed point 

of / as long as g'(x) ^ 0: 


f(x) =x- 


9 (x) 

9 '{x) 


g'(x) 


= X. 


Moreover, as long as g has a second derivative near x , 


f'{x) = 1- 


g'(x) ■ g'(x) - g(x)g"(x) 


= 1-1 


= 0 . 


g'(x) ■ g'(x) 
0 ■ 9"(x) 
g’{x) ■ g'{x) 


From these calculations, we conclude if g{x) is twice differentiable, g(x) = 0 and g'(x) 7^ 0, then fixed point iteration 
of f(x) with initial value in a neighborhood of x will converge quadratically to x. What a great way to turn a root 
finding problem into a fixed point problem! 

Now is a good time to recall that fe was just one of 6 candidate functions designed to find the roots of 
g{ x) = —x 3 + 5x 2 — 4a; — 6 by fixed point iteration. Indeed, g'( x) = —3a; 2 + 10a; — 4 and 


g(x) —a; 3 + 5a; 2 — 4a; — 6 

g'(x) X —3a; 2 + 10a; — 4 
2a; 3 — 5a; 2 — 6 
3a; 2 — 10a; + 4 
= fe(x). 
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Using fixed point iteration on /g(x) = x — to find roots of g(x), as done here, is called Newton’s method. 

A Geometric Derivation of Newton’s Method 

The following figure shows how to compute the first two iterations of Newton’s method on g(x) = — x 3 + 5x 2 — 4x — 6 
with initial value xq = — 2.5 geometrically. 



To compute Xi, the tangent line to g at (xo,g(xo)) is drawn and its intersection with the x-axis is X\. Similarly, 
the tangent line to g at (xi, g(x\)) is drawn and its intersection with the x-axis is X 2 - And so on. For example, 
(xo, g(x o)) = (—2.5, 50.875) and g'{x o) = g'{— 2.5) = —47.75. Hence, the “rise” (0 — 50.875) over the “run” (xi + 2.5) 
between (—2.5,50.875) and (xi,0) must equal —47.75. We thus have ' 5 5 = —47.75 so 


Xi 


-50.875 

-47.75 


- 2.5 


-1.43455497382199. 


In symbols, the “rise” (— g{xo)) over the “run” (xq — Xq) must equal g'(x o). In other words, 


o) 

Xi - x 0 
-gfo o) 
g'(x o) 

Xi 


g'(x 0 ) => 


Xi — X’o =>• 


Xo - 


g{x o) 

g'( x o ) ' 


Similar calculation shows X 2 = X\ — j/f.) ( , and more generally x n +i = x n — This recurrence relation describes 

Newton’s method -iterating the function /(x) = x — Jqqy- 


Newton’s Method (pseudo-code) 

Unlike Steffensen’s method, the denominator appearing in Newton’s method is not expected to approach zero as 
the iterates converge, so generally there is much less trouble with stability of the calculation and no intermediate 
checks are done before computing one iteration from the previous. 

Assumptions: g is twice differentiable, g has a root at x. Xo is in a neighborhood (x — 5, x + 8) where the 
magnitude of f{x) = 1 — 9 ^ 9 g gjy g 9 g^j 9 ^ is l ess than one. 

Input: Initial value Xo; function g and its derivative g desired accuracy tol; maximum number of iterations 

N. 

Step 1: For j = 1 ... N do Steps 2-4: 

Step 2: Set x = x 0 - 

Step 3: If |x — Xo| < tol then return x; 

Step 4: Set Xo = x; 

Step 5: Print “Method failed. Maximum iterations exceeded.” 

Output: Approximation x near exact fixed point, or message of failure. 
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Table 2.5: The secant method applied to g(x) = —x 3 + 5x 2 — 4x — 6 with xq = 5 and X\ = xq + g( xq) = —21. 


n 

Xn 

3 - x n \ 

0 

5 

2(10) u 

1 

-21 

2.4(H)) 1 

2 

4.9415730337078 

1.941(10)° 

3 

4.8869924815972 

1.886(10)° 

4 

4.0502898397912 

1.050(10)° 

5 

3.7088949488497 

7.088(10)- 1 

6 

3.412824115541 

4.128(10) _1 

7 

3.232292913133 

2.322(10) _1 

8 

3.1141957095727 

1.141(10) _1 

9 

3.0465011115969 

4.650(10)- 2 

10 

3.0132833760752 

1.328(10)- 2 

11 

3.0020189248976 

2.018(10)- 3 

12 

3.0001014520965 

1.014(10)- 4 

13 

3.0000008128334 

8.128(10)- 7 

14 

3.0000000003297 

3.297(10)- 10 


Secant Method 

The greatest weakness of Newton’s method is the requirement that g' be known and used in the calculation. 
The derivative is not always accessible or manageable or even known, though. In such a case, it is better to use 
Steffensen’s method or the secant method. The secant method is derived by replacing the g' of Newton’s method 
with a difference quotient. In order for this to make any sense, though, we will need to restate Newton’s method in 
terms of x n . In Newton’s method we are iterating f(x) = x — yjy so x n+ \ = x n — g^x\ • 

Now suppose you have a function g and some iterate x n —\. That is enough to locate one point on the graph 
of g , namely (x n -i,g(x n -i)). But we need another point in order to form a difference quotient (the slope of 
the line through two points). So suppose we have a second value, x n , near x n —i. Then 3 ^ x "'* ~ g'{x n ) 
so we can substitute 9 ^ X ^Z^ X " 1 ~ 1 ^ for g'(x n ) in Newton’s method. This yields the secant method, x n +i = x n — 
g(x n )/ , which simplifies to 


%n -\- 1 — %n 


g(x n ) 


%n %n — 1 

g{x„) - g{x n - 1 ) ' 


(2.4.1) 


Notice this is not quite a fixed point iteration scheme. Each iteration depends on the previous two values, not one. 
The analysis we’ve done so far does not apply, but there’s hope that convergence will be fast since this method is a 
reasonable approximation of Newton’s method near a root, assuming g is differentiable near there. Table 2.5 provides 
evidence that the secant method indeed converges quickly. In the particular case of g(x) = —a: 3 + 5a; 2 — 4a; — 6 with 
Xq = 5 and x\ = Xq + g( X q) = —21, it takes a while to settle in, but after the first 8 iterations or so, convergence is 
very fast. Not quite quadratic, but superlinear for sure. 


Crumpet 12: The secant method converges with order 1+ ^ 


Suppose g is a function with root x, g'(x) ^ 0, g"(x) ^ 0, and g"\x) exists in a neighborhood of *. Let 
xo, xi, X 2 , ■ ■ ■ be a sequence derived from the secant method (x n +i = x„ — g(x n ) i) ^ or ^ — 2) such 

that lim Xk = x. Define e n = x n — x so x n = x + e n . Making this substitution into 2.4.1 we have 

k—too 


6n+l — G-n 


g(x + e n ) 


6 n &n — 1 

g(x + e„) - g{x + e„_r) ' 


(2.4.2) 


Taylor’s theorem allows g(x + ek) = g(x) + ekg\x) + ^e\g"{x) + 0(e|). Noting that g(x) 


0 and substituting 
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into 2.4.2, 


1 — 6 n ( 671 671 — l) 


eng'(x) + \e 2 n g"(x) + 0(e®) 


(e„ - e„_i)fl-'(A) + |(e£ - e 2 .^"^) + 0(4^) 


e ™ + + 


— 


1 -1- 

(e„+e n _i )g' 

' (*) , 

o(4_i) 


2g'(x) 


(<3n-e„-l) 

1 1 ( 

>n+en-l)s"l 

f*) i 

o(<_ i) ^ 


2g'(x) 

+ < 

,e„-e„_i) J 


1 -1- 

(e„+e. 

»-l , 


- ( e ™ + ^2§n$r + °( e «) 


2g'(i) (e„-e„_!) 

e " e "- 1 ^ll + e„-4-l 0 ( e "-l) + °( e n) 
i , (e,i+e„-i )g"(&) . °(4-l) 

^ I Or,'(^\ ' 


2g'(x) ' (e„-e„-i) 

Using equality 2.4.3 to find a value a for which lim n _ >00 l - _ Xrl+ ,a = A ^ 0, we have 


lim \x-x n+1 \ = Um \e n+ i 

n—>oo \x — X n “ in oo |e„ “ 


= lim 

n — >00 


pl-“p 9 0) I 
e n c n — i , i/.n f 


2g'(rc) e n -e n _ 


■0(eLi) + 0(er a ) 


, , (e„+e„_i)g"(±) , °( e „_i) 

1 + 2g'(i) + 


( e n — e n_l ) 


(2.4.3) 


= A ^ 0. 

But limn-Kxj e n = lim^-nx, e n _i = 0. Hence, lim^-Hx, ei~ a e n -i must not be 0 or divergent, for if it were, 
limn-Kjo 4"~| 'i would be 0 or divergent, respectively. Consequently, there is a positive constant C such that 
lim„->oo \ei~ a en-i\ = lim n _>.oo |4+“e„| = C =4> limn-xx, |e„+i = (7 1/(1_ “ ) . Now we have 

lim = A ^ 0 and lim - — ,, = C 1 ^ 1 ~ a ' 1 ^ 0. 

n—too \e„\ a ^ n—too \ e n | VO” 1 ) ^ 

Since the order of convergence of a sequence is unique (Exercise 20 of section 1.3) it must be that a = l/(a — 1) 
or a 2 — a — 1 = 0. The quadratic formula supplies the desired result. 


So far we have only applied Newton’s method and the secant method to the cubic polynomial g(x) = —x 3 + 
5x 2 — Ax — 6, a task not strictly necessary. The rational roots theorem, a basic tool from pre-calculus, would give 
you the roots exactly. The method would have you check ±1, ±2, ±3, and ±6 as possible roots of g. Assuming you 
did your checks by synthetic division, your work might look something like this: 


Jj - 1 

4T 


5 

-3 

2 


-4 

6 

2 


-6 

6 

0 


meaning g(x) = (x — 3) (—a: 2 + 2x + 2). The other two roots would then come from the quadratic formula applied 

-2±^+8 =1± ^3. 


to — x 2 + 2a: + 2 and would be 


Crumpet 13: Solving the cubic 


The solutions of the quadratic equation ax 2 +bx+c = 0 are given by the well-known quadratic equation. Less well- 
known, and significantly more involved, is any formula for the solutions of the cubic equation ax 3 +bx 2 +cx+d = 0. 
One method of solution follows. First, we let 


P 

q 


3ac — b 2 
3 a 2 


and 


2b 3 - 9 abc + 27a 2 d 
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Then we set 


3 _ _ 9 _ , Pi 

2 V 4 + 27' 


Third, we set wi,W 2 , and w 3 to the three possible (complex) values of w. Finally, the three solutions of ax 3 + 
bx 2 + cx + d = 0 are 

p b 

Xi — Wi — — — — , i = 1,2,3. 

Swi 6a 

This is essentially the method of Cardano, published in the 16 th century! 

For example, to solve the equation — x 3 + 5x 2 — 4x — 6 = 0, we start with 

3(-l)(-4)-5 2 13 

p = 3 (— i) 2 = ~y and 

2 • 5 3 — 9(— 1)(5)(— 4) + 27(— 1) 2 (— 6) _ 92 
q ~ 27(— l) 3 _ 27' 


Then 


w = — 


92 

J 922 

13 3 

2 • 27 

V 4 • 27 2 

27 2 

46 

V92 2 - 4 • 13 3 


27 

54 


46 

7=324 


27 

54 


46 

i 


27 

3' 


may 

set wi = -xpe : 

£(tan" 


In polar form, w 3 = 13 ^ l3 e^ tan ( 9 / 46 )—"') so we ma y get Wl = vfp e *(t<“ (9/46 )— tt)/ 3^ one 0 £ £jjg cu f,g roo f s of 
w 3 . Unfortunately, finding the angle (tan _1 (9/46) — n) /3 exactly amounts to solving a cubic equation! However, 
with a calculator in hand, one can get the approximation —0.982793723247329, which in the end will be good 
enough. So, the real part of wi is approximately cos(— 0.982793723247329) « .6666666666666667 and the 
imaginary part is approximately pp sin(— 0.982793723247329) « — 1. wi is suspiciously close to | — i. And we 
can check, (| — i) 3 = (|) 3 + 3(|)“ (— i) + 3- |(— i) 2 + (— i) 3 = ^ — — 2 + i = — 1| — Therefore, wi = | — * 
and we let w 2 = (| - i) (- 1 + # *) = ^ and w 3 * (| - i) (-| + = =*&* + 3 -^i. 

Finally, 


x\ 


x 2 = 


X3 = 


13 5 13wJ 

Wl a f n — W 1 + o] rx 

9u>i 3 9|wi| 2 

13 5 13w2 

w 2 + X h X — W2 + X] rx 

9u>2 3 9|W2| 2 


5 _ 5 „ 

+ g = Wl + Wl + - = 3 

5 5 

+ x = w 2 + W 2 + - = y/3 + 1 


13 5 

w 3 + X b X = w 3 + 12 

9u> 3 3 9 |w 3 | 2 


13m 5 _ 5 /- , 

+ -=W 3 + W 3 +x = — v3 + 1 


For an equation you most likely did not see in pre-calculus, or calculus for that matter, consider 

x — e x cos \J e 2x — x 2 = 0. 

You might try to solve this equation exactly, with a pencil and paper, but you would soon run into a dead end. This 
equation can not be solved explicitly. The best you can hope for is to approximate the solutions with a numerical 
method. To get some idea what we are in for, look at the graph of x — e x cos V e 2x — x 2 in Figure 2.4.1. The 
function oscillates wildly, and only oscillates more wildly as x increases. The graph crosses the x-axis 29 times on 
the interval from 0 to 4.5 so has 29 roots there! They are 

.3181315052047641, 1.668024051576096, 2.062277729598284, 

2.439940377216816, 2.653191974038697, . . . 

and can be found by Newton’s method with initial values 0, 1.5, 2, 2.4, 2.6, . . .. Can you find the next root? Answer 
on page 72. 
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Figure 2.4.1: The graph of x — e x cos \J e 2x — x 2 crosses the a;-axis infinitely many times. 



Secant Method (pseudo-code) 

A straightforward implementation of the secant method can easily be inefficient due to the number of times g 
appears in formula on page 67. The pseudo-code below takes great care not to compute each value of g more than 
once. If it seems more complicated than necessary, this is likely the source of the complication. 

Assumptions: g has a root at x. g is differentiable in a neighborhood of x. Xq and x\ are sufficiently close 
to x. 

Input: Initial values Xq and xi; function g; desired accuracy tol ; maximum number of iterations N. 

Step 1: Set y 0 = g(x 0 ); yi = g{xi) 

Step 2: For j = 1 ... N do Steps 3-5: 

Step 3: Set x = x\ — y\ Xl ~ Xo ■ 

Step 4: If \x — x±\ < tol then return x; 

Step 5: Set x 0 = X\; y 0 = yi; Xx = x; yi = g{x i) 

Step 6: Print “Method failed. Maximum iterations exceeded.” 

Output: Approximation x near exact fixed point, or message of failure. 


Seeded Secant Method (pseudo-code) 

The greatest drawback to the secant method is the necessity of two initial values. They should be near one another, 
but how near, and how do you determine? These are tough questions, and the answers are complicated at best. 
One reasonable approach is to let X\ = x$ + g( xq). Assuming xq is near a root, g{x o) will be small, so Xi will be 
near xq. Taking this approach relieves the user from the burden of selecting a second initial value. There are times 
when such automated selection is not desirable, so both methods have their place. This method only works well 
when the initial approximation is good. 

Assumptions: g has a root at x. g is differentiable in a neighborhood of x. xq is sufficiently close to x. 
Input: Initial value Xq' function g\ desired accuracy tol; maximum number of iterations N. 

Step 1: Set y 0 = g(x 0 ); Xi = x 0 + y 0 ; yi = g(xi) 

Step 2: For j = 1 ... N do Steps 3-5: 

Step 3: Set x = x\ — yi Xl ~ Xo ■ 

Step 4: If |x — X\\ < tol then return x; 

Step 5: Set x 0 = aq; y 0 = yi; x\ = x; y\ = g(x i) 

Step 6: Print “Method failed. Maximum iterations exceeded.” 

Output: Approximation x near exact fixed point, or message of failure. 
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Key Concepts 

Rational Roots Theorem: If the polynomial p{x) = ao + a\X + • • • + akX k has rational coefficients, then any 
rational roots of p are in the set { ^ : n is a factor of a o and d is a factor of afc } . 

Synthetic division: A method for calculating the quotient of a polynomial by a monomial. Example on page 68. 

Newton’s method: A root finding method that generally converges to a root of g{x) quadratically, but requires 
the use of the derivative. In this method, Xq is chosen and x n+ \ = x n — g>^x\ * s computed for each n > 0. 

Secant method: A root finding method that generally converges to a root of g{ x) with order approximately 
1.618, but does not require the use of the derivative. In this method, Xq and X\ are chosen and x n +\ = 
Xn - g( x n) g (x")~ X g(x. ^-i) is computed for each n > 0. 

Seeded secant method: A modification of the secant method where * 0 is chosen and X\ = Xq + g{x o). 


Exercises 


2 . 


3. 


4. 


8 ■ Write Octave code that implements Newton’s 
method as a function. 

8 ■ Write Octave code that implements the secant 
method as a function. 

8 ■ Write Octave code that implements the seeded se- 
cant method as a function. 

8 » Use your secant method function from question 2 
with a tolerance of 10~ 5 to find a solution of 


(a) e x + 2~ x + 2 cos * — 6 = 0 using 1 < *o < 2. 

(b) ln(* — 1) + cos(* — 1) = 0 using 1.3 < xo < 2. 

(c) 2* cos* — (* — 2) 2 = 0 using 2 < *o < 3. ^ 

(d) 2* cos* — (* — 2) 2 = 0 using 3 < *o < 4. ^ 

(e) (* — 2) 2 — In* = 0 using 1 < *o < 2. 

(f) (* — 2) 2 — In* = 0 using e < *o < 4. 

5. 8 ■ Repeat exercise 4 using your Newton’s method code 
from question 1. - ' 

6. 8 • Repeat exercise 4 using your seeded secant method 
code from question 3. 

7. 8 o Repeat exercise 4 using a tolerance of 10~ 10 . Taking 
this new value as the exact value, did using a tolerance 
of 10~ 5 give a result accurate to within 10~ 5 of the 
exact value? ^ 

8. Let <?(*) = tS? sin (~y) and *o = 1.25. Find *i and *2 
of Newton’s method. ^ 

9 . Let g(x) = 21n(l + * 2 ) — *. Find *14 using Newton’s 
method with 


(a) *o = 5 

(b) * 0 = 1.2 W 

10. Let g(x) = 21n(l + * 2 ) — *. Find *2 and *3 using the 
secant method with 

(a) *o = 5 and *1=6 ^ 

(b) *o = l and *1 = 2 


11 . 


12 . 

13. 

14. 

15. 


16. 


17. 


18. 


19. 


20 . 


21 . 


Compare the secant method and Newton’s method 
based on questions 4 and 5. Which finds roots in fewer 
iterations? Which one fails least often? Which is bet- 
ter? 


Compute the first three iterations of Newton’s method 
applied to g(x) = * — (v2) x with *0 = 3. 

Find a value of *0 for which Newton’s method will fail 
to converge to a root of g(x) = 2 + * — e x . 


Explain why Newton’s method fails to converge for the 
the function g(x) = * 2 + * + 1 with *0 = 1. 


Let g(x) = 


2 ln(l + * 2 ) — * 


. Using Newton’s method 

1 + x 2 

to find a root of g(x ) with * 0 = 5 yields *14 = 
8.6624821192 and with *0 = 1.2 yields *14 = 0. Com- 
pare the values of *14 and *14 with the fourteenth iter- 
ations from question 9 and explain any similarities or 
differences. ^ 


Let g(x) = e 3x — 27* 6 + 27* 4 e x — 9x 2 e 2x and let 
Po = 4. Find pio using Newton’s method. HINT: 
g'(x ) = 3e 3x - 18(* + x 2 )e 2x + 27(* 4 + Ax 3 )e x - 162* 5 . 

[A] 


Newton’s method does not introduce spurious solu- 
tions. Suppose /(*) = * — ^ii anc j g'(x) ^ 0. Prove 
that * is a root of g if and only if * is a fixed point of /. 
Hint: one direction is proven in the text of this section. 

The polynomial g{x) = * 4 + 2* 3 — * — 3 has a root 
* « 1.097740792. Find the largest neighborhood (a, b) 
of * such that Newton’s method converges to * for any 
initial value *0 £ ( a,b ). ^ 

8 • Use Newton’s method to find a negative solution of 

0 = 12* 4 - 13* 3 + 7* 2 + * - 130 

accurate to the nearest 10“ 4 . What initial value did 
you use? How many iterations did it take? 

Consider the function g(x) = e 6x + 3(ln2) 2 e 2a: — 
(ln8)e 4:c — (ln2) 3 . Compute enough iterations of New- 
ton’s method with *0 = 0 to approximate a zero of 
g with tolerance 0.0002. Construct the Aitken’s delta 
squared sequence ( a n ). Is the order of convergence im- 
proved? ^ 

As with Newton’s method, the secant method can eas- 
ily be described geometrically: Draw the line through 
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the two points (*0, f(x 0)) and (*1, f(x 1)). Find the in- 
tersection of this line with the a;-axis. The ^-coordinate 
of the intersection is xi. Find *3 by intersecting the 
line through (xi, f(xi)) and {X 2 , fix?)) with the x- 
axis. And so on. Graph the polynomial p(x) = 
x 3 — 3x + 3, and demonstrate the first iteration of the 

secant method graphically for xo = —1 and x\ = —2. 
[S] 

22. Suppose you are using the secant method with xo = 1 
and xi = 1.1 to find a root of f(x). 

(a) Find X 2 given that /( 1) = 0.3 and /(l.l) = 0.23. 

(b) Create a sketch (graph) that illustrates the calcu- 
lation. HINT: X 2 will be located where the line 
through ( xo,f{xo )) and (*i,/(xi)) crosses the x- 
axis. 

23. Use the graph of g to answer the following questions. 
g has roots at — 27r, — n, n, and 2-7T. ^ 



(a) To which root will Newton’s method converge if 
x 0 = 2.5? 

(b) What will happen if xo = 0? 

(c) Find a positive integer value of xo for which New- 
ton’s method will converge to 27r. 

(d) Find a negative value of xo for which Newton’s 
method will converge to 2-7r. 

24. Graph the polynomial p(x ) = x 3 — 3* + 3, and demon- 
strate Newton’s method graphically for xo = — 1. 


25. * • Use your code from question 2 to find a root of 
the function in the interval of question 2 on page 43 
to within 10 -8 . Compare your answer to that from 
question 4 on page 43. ^ 

26. The sum of two numbers is 20. If each number is added 
to its square root, the product of the two sums is 172.2. 
Determine the two numbers to within 1CD 4 of their ex- 
act values. ^ 

27. Find an example of a situation in which Newton’s 
method will fail on the second iteration (i.e., x\ may 
be calculated but X 2 may not). ^ 

28. Let h(x) = 2.2a: 3 — 6.6a; 2 + 4.4a; and let g{x) = h o 3 (x). 
That is, g(x) = h[h(h(x))). Approximate a root of 
g'(x). 

29. For what values of xo, approximately, will Newton’s 
method converge to —2.5? 



30. For the function shown in question 29, find X 2 and X 3 
for the secant method with xo = — 10 and X\ = 6. 

31. Let 

f W = 1 °-f 0 YVt dt - 

Approximate the positive root of /. ^ 

32. Of the root finding methods we have surveyed so far 
(Bisection, Fixed Point, Newton’s, Secant, and Stef- 
fensen’s), which one do you feel is the best? Why? 


Answers 

Quadratic convergence? 


n 

IT 

1 

2 

3 

4 

5 

6 


X n 

2 

2.5 

2.666666666666667 

2.722222222222227 

2.731741086881274 

2.732050478023325 

2.732050807568503 


-1 

-.7647058823529411 

-.7326286052763475 

-.7320509933083684 

-.7320508075688965 


2.732050807568877 -.7320508075688772 


The convergence looks quadratic since the number of significant digits of accuracy roughly doubles with the 
last couple of iterations. 

Next root? The next root is approximately 2.872257717171606. This can be found using Newton’s method with 
Xq = 2.81, for example. Note this computation is very sensitive to initial conditions because there are so many 
roots near one another. Starting with xq = 2.8, for example, leads to the root at 9.662623060421268! 


2.5. MORE CONVERGENCE DIAGRAMS 


73 


2.5 More Convergence Diagrams 

The cubic function g(x) = 1 — x 3 has one real root, 1. But it also has two complex roots. If you have studied 
complex analysis, you probably know what the other two are. And even if you have not studied complex analysis, 
you can figure them out by basic techniques of pre-calculus. Since 1 is a root, you can use synthetic division to 
deflate the polynomial: 


1 | -1 0 
-1 

AT =T 


o 

-l 



This division shows that g(x) = (x — l)(—x 2 — x — 1), so the other two roots are the solutions of the equation 
—x 2 — x — 1 = 0, thus deflating the problem to a quadratic. The solutions are | ± 1^. By the way, 

you may also recognize 1 — x 3 as one of the special forms of polynomials, the difference of cubes. 

Of course this is all fascinating, but what does this have to do with numerical analysis? What may surprise 
you is that fixed point iteration (and, therefore, Newton’s method), the secant method, and Steffensen’s method 
can all be used to find complex roots just as well as real ones! In fact, the algorithms need no modification! The 
programming language used to implement the methods, of course, does need to be able to handle complex number 
arithmetic. Octave does so without ado. 

First, finding a root of g(x) = 1 — ir 3 and finding a fixed point of f{x) = 1/x 2 are equivalent. Why? Answer 
on page 80. Setting Xq = — 1 + 1 and applying Newton’s method and the secant method to g{x) = 1 — x 3 , and 
Steffensen’s method to /( x) = 1/x 2 we get the following: 




Xi 


i 

Steffensen’s 

Secant 

Newton’s 

0 

-1 + 1 

-1 + 1 

-1 + 1 

1 

-0.85 + 0.8* 

-0.66666666 + 0.833333331 

-0.66666666 + 0.833333331 

2 

-0.60313824 + 0.677706391 

-0.55034016 + 0.823764441 

-0.50869191 + 0.841099871 

3 

-0.39846066 + 0.84671567* 

-0.49763752 + 0.855540141 

-0.49932999 + 0.866269171 

4 

-0.51660491 + 0.84998590* 

-0.49932718 + 0.86627140 1 

-0.49999991 + 0.866024901 

5 

-0.49910537 + 0.86543351* 

-0.50000774 + 0.866025041 

-0.50000000 + 0.866025401 

6 

7 

8 

-0.50000228 + 0.86602568* 
-0.50000000 + 0.86602540* 

-0.49999999 + 0.866025401 
-0.50000000 + 0.866025401 



Each sequence quickly converges to the complex root — | + ^1. And this is not a fluke or a contrived example. 
Generally, these methods work just as well in the complex plane as they do on the real line. One can find real roots 
starting with complex numbers too. If we change the initial value Xq to 1 + 1, Newton’s method converges to 1, for 
example. 

Having expanded our view of the methods to include complex numbers, there is a new type of convergence 
diagram to consider. We can now look at convergence patterns for the three methods over a host of initial values 
in the complex plane, not just the real line. Figure 2.5.1 shows convergence diagrams for Newton’s method with 
g(x) = 1 — x 3 , the seeded secant method with g(x) = 1 — a: 3 , and Steffensen’s method with f(x) = 1/x 2 . Each 
diagram covers the part of the complex plane with real parts in [—5, 5] and imaginary parts in [—3.75, 3.75]. The top 
left corner of each diagram represents initial value —5 + 3.75* and the bottom right corner represents initial value 
5 — 3.75*. The center of each diagram represents the initial value 0. The colors correspond to the three roots, red to 
1, green to — \ + ^1, and blue to — | — ^1. Black corresponds to failure to converge. The different intensities of 
red, green, and blue correspond to the number of iterations the method took to converge. The greater the intensity, 
the fewer iterations. We can see that for Xq = 5 — 3.75*, Newton’s method and the seeded secant method both 
converge to — ^ + ^1, because the upper right hand corner of each diagram is colored green. Steffensen’s method, 
on the other hand, fails to converge to any root if begun with Xq = 5 — 3.75*, evidenced by the blackness in the 
upper right hand corner of the convergence diagram. 

The dwell represents the maximum number of iterations allowed, so actually the black dots represent initial 
values for which convergence was not achieved within a number of iterations equal to or less than the dwell. That’s 
different from claiming the method does not converge at all for these initial values. There’s a chance that some of 
the blackened initial values would still lead to convergence if allowed more iterations. 
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Figure 2.5.1: Convergence diagrams over the complex plane. 



* 


% 


From top to bottom: 

Newton’s method with 

g(x) = 1 - x 3 
and dwell 20; 

seeded secant method with 

g{x) = 1 - a; 3 

and dwell 40; 

Steffensen’s method with 

/O) = \ 

x z 

and dwell 40. 

Each diagram covers the part of 
the complex plane with real 
parts in [—5, 5] and imaginary 
parts in [ — 3.75, 3.75] . 
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Figure 2.5.2: A vertical line and its image under the exponential function. 



Two things are very striking about these convergence diagrams. First, the seeded secant method and Newton’s 
method converge for a much larger set of initial values than does Steffensen’s method. This is, at least in part, 
due to the function chosen. For other functions, there may be a fixed point scheme for which Steffensen’s method 
converges on large sets of initial values too. Second, the patterns of colors are extremely intricate, even fractal 
in nature. Predicting to which root a method will converge for a given initial value, and indeed whether it will 
converge at all, are very difficult questions! And this analysis is done on a rather benign (simple) function. 

Consider now a much more complicated problem — finding the roots of g(z) = e z — z or, equivalently, finding the 
fixed points of /(z) = e 2 . A graph of /(z) (over the real numbers) will quickly convince you that there are no real 
number solutions. It will take some thought to determine the nature of any complex solutions. 

To that end, fix a real number ao and consider the vertical line in the complex plane, L ao = {ao + ib : b £ 
R}. The image of L ao under the exponential function is a circle with radius e a ° centered at the origin. Indeed, 
e a 0 +ib _ e Qo e *b = e Q °(cos 6 + *sin5). Thus b parameterizes the circle about the origin with radius e a ° . Now, 
suppose L ao contains a fixed point, z = ciq + ib , of the exponential function, /(z) = e 2 . Then z = /(z), or 
ao + ib = e a ° (cos b + i sin b) . We conclude that the line and the circle intersect at the fixed point. Every fixed point 
of / is necessarily an intersection of the line L ao with the circle C ao for some ao- Figure 2.5.2 shows a representative 
example. In fact, the diagram shows an interesting case: x = ao ~ 2.439940377216816. The coordinates of the two 
intersections are 

(2.439940377216816, ±11.2098911414971). 

The interesting thing is 

^2. 4399403772 168 16-f-l 1.209891 1414971? _ 2.439940377216816 - 11.2098911414971i 

and 

^2.4399403772 168 16— 11.2098911414971? _ 2.439940377216816 + 11.20989114149711. 

The two points are images of one another under the exponential function! What we have found here are called peri- 
odic points. If we let zi = 2.439940377216816-11.2098911414971* and z 2 = 2.439940377216816±11.2098911414971i, 
then e 21 = z 2 and e 22 = zl. Hence, if we iterate z 2 = /(zi), Z 3 = /(z 2 ), z 4 = /(Z 3 ), Z 5 = /(Z 4 ), and so on, the 
sequence zi , z 2 , Z 3 , Z 4 , . . . actually looks like 


Zi,Z 2 ,Zi,Z 2 ,Zi,Z 2 ,.... 

The sequence just flops back and forth between zi and z 2 in a periodic fashion. We call such values period 2 points. 
They are not fixed points of /(z) but they are fixed points of /(/(z))! 


Crumpet 14: Periodic points. 


If a sequence (p n ) has the form 

Pi,P2, ■ ■ ■ ,Pk,Pl,P2, ■ ■ -,Pk,Pi, ■ ■ ■ , k > 1 
then we say pi is a period k point (and P 2 ,P 3 , ■ ■ ■ ,Pk are too!). 
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Figure 2.5.3: More convergence diagrams over the complex plane. 



From left to right: Newton’s method with g(z ) = z ^ e z and dwell 20; secant method with g(z) = z — e~ and 
dwell 40; Steffensen’s method with f(z ) = e z and dwell 40. Each diagram covers the part of the complex plane 
with real parts in [—10,30] and imaginary parts in [0,73]. 


On the other hand, z = 2.062277729598284 + 7.588631178472513* is (approximately) a fixed point of f(z) since 
^2. 062277729598284+7. 58863 11 78472513* = 2.062277729598284 + 7.588631178472513*. 

Moreover, the conjugate of z, z = 2.0622377729598284 — 7.588631178472513* is also a fixed point. Verify it with a 
calculator or wih Octave! 

Generally, if z is a fixed point of e z then so is z: 

z = e z ==> z = e & = e z . 

So if we find one fixed point, we actually have found two, the fixed point and its conjugate. 

We’re ready to get back to considering intersections of L ao and C ao . Assume ao + ib is a fixed point of e z . Then 
a 0 + ib = e ao+lb = e“° (cos b + i sin b ) , so 


ao = e a ° cos b 

b = e°° sin b (2.5.1) 

Now, because ao + ib is a point of intersection, it is on C ao , so Oq + b 2 = e 2a ° => b = ±\/e 2ao — Oq. Finally, 
substituting b = \J e 2a ° — a 2 into 2.5.1, we find an intersection point will be a fixed point if and only if 

a 0 = e a ° cos \J e 2a ° — a q 
and 

\J e 2a ° — Oq = e a ° sin e 2a ° — Oq. (2.5.2) 

You should pause long enough to consider why it is not necessary to substitute b = —^/e 2a ° — a(j into 2.5.1. Hint: 
make the substitution and simplify. You should find out that the two equations you get are equivalent to those in 

2.5.1. 

For example, 2.439940377216816 - 11.2098911414971* and 2.062277729598284 + 7.588631178472513* both sat- 
isfy the first equation of 2.5.2, but 2.439940377216816 — 11.2098911414971* does not satisfy the second while 
2.062277729598284 + 7.588631178472513* does. So, as observed earlier, 2.439940377216816 - 11.2098911414971* is 
not a fixed point but 2.062277729598284 + 7.588631178472513* is. 
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Do you recognize the first equation of 2.5.2? We first saw it on page 69 in section 2.4. As noted there, the 
smallest five solutions are 


.3181315052047641, 1.668024051576096, 2.062277729598284, 

2.439940377216816, 2.653191974038697, . . . 

The values 2.062277729598284 and 2.439940377216816 provided the examples for this discussion. What about the 
other three values in this list? Do they give fixed points of the exponential function? Period two points? Something 
else? Take a moment to investigate. Answers are on page 80. Using Octave to investigate 2.062277729598284, 
which we know is a fixed point: 


octave :1> f ormat ( 1 long’ ) 
octave :2> a0=2 . 062277729598284 
aO = 2.06227772959828 

octave :3> b=sqrt(exp(2*a0)-a0~2) 
b = 7.58863117847251 

octave :4> exp(aO+I*b) 

ans = 2.06227772959828 + 7. 588631 17847251i 


verifies that e a ° +lb = ao + ib for ao = 2.062277729598284, at least to machine precision. The exact value of the 
fixed point is not known, but that is the nature of numerical analysis. 

Figure 2.5.3 shows convergence to 12 of the fixed points of e z , one for each of the 12 different colors. The 
coordinates of each fixed point can be approximated by locating the spot of greatest intensity within each colored 
band. 

As was done in Figure 2.5.3, convergence diagrams for the secant method can be created by setting x\ = Xq + S 
for some small number d. It does not matter whether S is real or complex. Selecting x\ automatically this way 
allows the diagram to show convergence or divergence based on xq alone, just as is done for the other convergence 
diagrams. You will notice that the convergence diagram for the secant method and the convergence diagram for 
Newton’s method are quite similar. For sufficiently small <5, this will be the case in general. The secant method 
convergence diagram and the Newton’s method convergence diagram for the same function over the same region will 
look very much the same. The only significant difference will be the number of iterations needed for convergence. 
The secant method will need more iterations to converge. 


Exercises 

1. Match the function with its Newton’s method convergence diagram. The real axis passes through the center of each 
diagram, and the imaginary axis is represented, but is not necessarily centered. bl 


f(x) = 56 - 152a: + 140a; 2 - 17a: 3 - 48a: 4 + 9x 5 
g(x ) = (x 2 )(lnx) + (x — 3)e x 

h(x) = 1 + 2x + 3a: 2 + 4x 3 + 5a; 4 + 6a; 5 

l{x) = (lnx)(x 3 + l) 
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2. Match the function with its Newton’s method convergence diagram. The real axis passes through the center of each 
diagram, and the imaginary axis is represented, but is not necessarily centered. ^ 


f(x) = sin* 

g(x) = sinx-e - * 

h(x) = e x + 2~ x + 2 cos* — 6 

l(x) = * 4 + '2x 2 + 4 



3. Find a polynomial that has the following roots and no others. 

(a) -7, 2,1 ±5* 

(b) -7,2,l + 5i 
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(c) -4, -1,2, ±2 

(d) —4, —1,2,2* ^ 

(e) 0,-l±t,l±i 

(f) —3 + i, — 2 — i, — 3i, 1 — 2* 

4. Create Newton’s method convergence diagrams for the polynomials of question 3. Make sure you capture a region that 
shows at least a small area converging to each root. Octave code may be downloaded at the companion website. 

5. The functions /(*) = e x and g(x) = r2 1 , L have no roots, real or complex. Find at least two others that also have no 
roots. 

6. Let /(*) = * 2 -t*+ 10 + sin(3*). 

(a) Find all the real roots of /. This is not a polynomial, so deflation will not work. Instead, graph the function and 
use Newton’s method to find the real roots accurate to 10 -8 . There are four of them. 

(b) Create a Newton’s method convergence diagram for / to see if there are any complex roots. If so, use Newton’s 
method to approximate them. Use the convergence diagram to help you choose initial values. 

(c) Can you find all the roots of /? 


7. Match the function with its seeded secant method convergence diagram. The real axis passes through the center of 
each diagram, and the imaginary axis is represented, but is not necessarily centered. ^ 


/(*) = sin* 

g(x) = sin* — e~ x 

h{x) = e x + 2~ x + 2 cos* — 6 

l(x) = 56 - 152* + 140* 2 - 17* 3 - 48* 4 + 9* 5 



8. Match the function with its seeded secant method convergence diagram. The real axis passes through the center of 
each diagram, and the imaginary axis is represented, but is not necessarily centered. ^ 

/(*) = * 4 + 2* 2 + 4 

g{x) = (* 2 ) (In *) + (* — 2>)e x 
h(x) = 1 + 2* + 3* 2 + 4* 3 + 5* 4 + 6* 5 
l(x) = (ln*)(* 3 + l) 
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9. Create seeded secant method convergence diagrams for the polynomials of question 3. Make sure you capture a region 
that shows at least a small area converging to each root. Octave code may be downloaded at the companion website. 

10. The Newton’s method convergence diagram for one polynomial is much like the Newton’s method convergence diagram 
for another. Interesting changes in the Newton’s method convergence diagrams and seeded secant method convergence 
diagrams can be achieved by multiplying a polynomial by a non-polynomial function with no roots. Create Newton’s 
method and seeded secant method convergence diagrams for products of functions in question 3 with functions in 
question 5. 

11. Discuss the relative strengths and weaknesses of Newton’s method, the secant method, and the seeded secant method. 

Answers 

Why equivalent? The equations g(x) = 0 and /( x) = x have exactly the same solutions. g( x) = 0 <t=> 1 — a; 3 = 
0<t=>l = a: 3 <t=>^ = :r<t=> f(x) = x. 

Nature of roots? .3181315052047641 is a fixed point of the exponential function: 

octave :1> f ormat ( ’ long’ ) 
octave :2> a0= . 3181315052047641 ; 
octave :3> b=sqrt (exp(2*a0) -a0~2) 
b = 1.33723570143069 

octave :4> exp(a0+I*b) 

ans = 0.318131505204764 + 1 . 337235701430689i 

1.668024051576096 is a period two point of the exponential function: 

octave :1> f ormat (’ long’ ) 
octave :2> a0=l . 668024051576096 ; 
octave :3> b=sqrt (exp(2*a0) -a0~2) 
b = 5.03244706448616 

octave :4> exp(a0+I*b) 

ans = 1.66802405157609 - 5 . 03244706448616i 


2.653191974038697 is a fixed point of the exponential function: 
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octave :5> a0=2 . 653191974038697; 
octave:6> b=sqrt (exp(2*a0)-a0~2) 
b = 13.9492083345332 

octave :7> exp(aO+I*b) 

ans = 2.65319197403878 + 13 . 94920833453319i 
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2.6 Roots of Polynomials 

Synthetic division revisited 

You may recall using the rational roots theorem and synthetic division to find roots of polynomials of degree 3 or 
more in algebra. The process was something like this. You made a list of possible roots based on the rational roots 
theorem. You checked each one using synthetic division until you either found a root or ran out of candidates. It 
is possible that was as far as your class took the process, but there is more to say. 

Suppose we have a polynomial p(x) and a number t. Synthetic division gives coefficients of q(x) such that 
p{x) = q{x) ■ (x — t) + p(t). For example, the synthetic division 

t 


-3 


tells us that p{x) = —4a; 3 + 2a; 2 + 3a; — 6 = (—4a; 2 + 14a; — 39) (a; + 3) + 111. While it is a small burden to evaluate 
the expression —4a; 3 + 2a; 2 + 3a; — 6 when x = —3, it is no burden at all to evaluate (—4a; 2 + 14a; — 39) (a; + 3) + 111 
when x = —3. The (a; + 3) factor is zero, so it doesn’t matter to what (—4a; 2 + 14a; — 39) evaluates. The product is 
zero and (—4a; 2 + 14a: — 39) (a: + 3) + 111 evaluates to 111. Therefore, p(— 3) = 111. Synthetic division gives a quick 
way to evaluate a polynomial. The number at the end of the division is the value of the polynomial at the value of 
the divisor. 

More generally, here is a dissection of the division of p{ x) = ao + a\X + • • • + a n x n by x — t using synthetic 
division: 


p(x) 


-4 


2 3 

12 -42 


-4 14 -39 


-6 

117 


111 


q(x) 


P(t) 


tin tin— 1 

a n t 


a n -2 

a n (a n t + a n -i ) 


a n t + a n - 1 a n (a n t + a n -i) + a n -2 


a o 


a n (' • • a-n{a n (a n t + a n - 1) + a n _ 2) + • • • + ai) 


P(t) 


Beginning with t in the upper left corner, we end up with p{t) in the lower right corner. It is not only when the 
number in the lower right corner is zero do we find something of interest. Every synthetic division gives something 
of interest! The number in the bottom right corner is p(t) whether it turns out to be zero or not. And there is 
more. 

The numbers a n , a n t + a„_ 1 , a n (a n t + a„_ 1 ) + a n _ 2 , and so on, appearing in the bottom row of the synthetic 
division give the coefficients of the quotient, q(x). Every synthetic division gives a decomposition of the polynomial 
into quotient and remainder. Thus, with every synthetic division, we get an equivalent expression of the form 
q(x) ■ (x — t) + p(t). There is still more. 

Differentiating the equation p(x) = q(x) •(& — £)+ pit) with respect to x gives 


p'{x) = q\ x) -(x-t) + q( x). 


Hence, p'(t) = q'(t) ■ (t — t) + q{t) = q{t). So, not only do the numbers in the bottom row give the coefficients of 
the quotient, they double as coefficients appropriate for evaluating p'(t). Returning to the previous example, if we 
desire to calculate p'(— 3), we simply continue the synthetic division as in 


-3 -4 


-3 


-4 


-4 


2 

12 

14 

12 

26 


3 

-42 

Y39 


-6 

117 

111 


-78 


-117 


and find out j/{— 3) = —117. 
is known as Horner’s method 
find a root of p(x) = —4a; 3 + 
p(x 0) _ _0 _ 111 ^ 


xi = x 0 - 


p'(* 0 ) 


-117 


The procedure of calculating p(t) and p'(t) by simultaneous synthetic divisions 
and is especially convenient for use in Newton’s method. If we were trying to 
2a; 2 + 3a; — 6 with initial approximation Xq = —3 we would have, at this point, 
i —2.05128. Yet there is more. 
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Finding all the roots of polynomials 

When we happen upon a root of the polynomial p(x), the result of the synthetic division, p(x) = q(x)(x — t ) +p(t), 
reduces to p(x) = q( x){x — t) since t is a root, meaning p[t) = 0. In this case, we have a factorization of p(x). The 
rest of the roots of p are exactly the roots of q : so having found one root, we have reduced the problem of finding 
roots of p to (a) noting the root we have found plus (b) finding the roots of the polynomial q , a polynomial of 
one degree less than that of p. In this way, we have deflated the problem of finding the n roots of the n th degree 
polynomial p to finding the n — 1 roots of the ( n — l)-degree polynomial q. Taking it a step further, when we have 
found a root of q , we can use synthetic division to reduce the problem again. We (a) note the root of q and (b) 
continue searching for roots of the quotient, an (n — 2)-degree polynomial. We continue this way, deflating the 
problem by one degree each time we find a root until we have reduced the problem to a 2 nd degree polynomial. At 
this point, we have a quadratic polynomial and can use the quadratic equation to find the last two roots. 

For example, —1.18985 is (approximately) a root of p(x) = —4a: 3 + 2a: 2 + 3x — 6. Synthetic division of p(x) by 
(x + 1.18985) gives 


-1.18985 -4 


2 

4.7594 


-8.04267 


-6 

6.00002 


-4 6.7594 -5.04267 0.00002 


The (near) zero in the box at the bottom-right indicates that —1.18985 is approximately a root. There is no appre- 
ciable remainder upon division of — 4x 3 + 2x 2 + 3x — 6 by x + 1.18985. Moreover, the numbers —4, 6.7594, —5.04267 
in the bottom row give the coefficients of q(x). Thus, we find from this division that — 4x 3 + 2x 2 + 3x — 6 =« 
(— 4x 2 + 6.7594x — 5.04267) (x + 118985). We can now find the other two roots by locating the roots of q(x) = 
— 4x 2 + 6.7594x — 5.04267. Using the quadratic formula, they are 


-6.7594 ± \/6.7594 2 — 4(—4)(— 5.04267) 
^8 


.84493 ± .73944b 


Our process will lead us to finding n roots of any n th degree polynomial. It is important to note that some of 
these roots may be complex and some of them may be repeated. 


Crumpet 15: The Fundamental Theorem of Algebra 


The process of finding one root of a given polynomial, deflating, and finding another mirrors quite closely the 
mathematical theorems of algebra. The Fundamental Theorem of Algebra states that every polynomial with 
complex coefficients and degree at least one has a complex root. Thus our search for a root is not in vain! We can 
then write our polynomial in factored form and continue. The Fundamental Theorem says that there is again a 
root of the deflated polynomial. And if we keep track of all the roots as we find them, we end up writing our 
polynomial in the form 

p(x) = a(x — ri) ei (x — rff 2 ■ ■ ■ (x — rkY k , (2.6.1) 

where a is a nonzero constant, n, r - 2 , . . . , r/t are the k distinct complex roots, and ei, ea, . . . , e*, are the so-called 
(positive integer) multiplicities of the roots. From this form, we see that the degree of the polynomial equals the 
sum of the multiplicities, ei + e 2 + ■ • • + eu- This is what we mean when we say the number of roots, counting 
multiplicity, is equal to the degree of the polynomial. Thus when searching for the roots of a polynomial of degree 
n, we know we are looking for n roots, but not necessarily n distinct roots. Some of them may be repeated and 
the repetitions are accounted for in the multiplicities. To formalize the claim in equation 2.6.1, we have the 
foilwing theorem. 

Theorem 6. (Fundamental Factorization Theorem) If n > 1 and p is a degree n polynomial , then 

p(x) = a(x — ri) ei (x — r 2 ) 62 • ■ • (x — rkY k 

for some constant a Y 0, roots ri,r 2 , ■ ■ ■ , rk, and positive integer exponents ei, e 2 , ,ek where 

k 

3 = 1 
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Proof. Suppose n = 1 so p(x) takes the form ax + b with a Y 0. Then p(x) = a(x — (— ^)) 1 and thus takes 
the required form. Now suppose all polynomials of some degree n > 1 take the required form and let p be a 
polynomial of degree n + 1. By the Fundamental Theorem of Algebra, p has a root. Call it p. Then x — p is 
a factor of p so p can be written as p(x) = (x — p) ■ q(x) for some polynomial q of degree n. By the inductive 
hypothesis, we have that q takes the required form, so 

p(x) = (x — p) ■ a( x — ri) ei (x — ri Y 2 ■ ■ ■ [x — rkY k 

where ei + eo + ■ ■ ■ + ek = n. If p is distinct from n,r 2 , ■ ■ ■ , r/t, then p takes the form 

p(x) = a(x — ri) ei (a; — r 2 ) 62 ■■•(* — rkY k {x — p) 1 . 

If p equals one of r \ , r 2 , . . . , r*, , say r j , then p takes the form 

p{x) = a(x — ri) ei (a; — r 2 ) e2 •••{x — rj) ej+1 •••(* — ruY k ■ 

In either case, p takes the required form and the proof is complete. J|J 


Pseudo-pseudo-code for this procedure might look something like this: 

Assumptions: p is a polynomial of degree n > 2. 

Input: Polynomial p(x); tolerance tol ; maximum number of iterations N. 

Step 1: For i = 1 to n — 2 do Steps 2-5: 

Step 2 : Find a root Xq of p(x) [using tol , N , and some root-finding method]; 

Step 3: If error trying to find xq then 

return “Method failed. Root of degree n — i + 1 not found.”; 

Step 4: Factor p(x) as q(x) ■ (x — xo); 

Step 5: Set Xj, = Xq; p{x) = q(x)\ 

Output: Approximate roots. 

To refine the pseudo-pseudo-code into pseudo-code, we will use Newton’s method, assisted by Horner’s method, 
in Step 2. The usual drawback of Newton’s method, the requirement that the derivative be known and calculated, is 
but a small inconvenience when Horner’s method is employed. But how do we represent polynomials in a computer 
program so that we can accomplish Steps 4 and 5? The same way we implement code to execute Horner’s method. 
Pseudo-code for Horner’s method, with an array: 

Assumptions: p is a polynomial of degree n > 1. 

Input: array [c] of coefficients of p(x) = c\ + C 2 X + C 3 X 2 + • • • + c n+ \X n ] Xq. 

Step 1: Set y = c n+ 1 ; z = c n+1 ; 

Step 2: For j = n, n — 1, . . . , 2 do Step 3 
Step 3: Set y = xq y + Cj\ z = XqZ + y; 

Step 4: Set y = Xqy + ci; 

Output: y = p(x 0 ) and 2 = p'(x 0 ). 

As in synthetic division, there is no need to retain the variable to various exponents. Only the coefficients are 
needed to define a polynomial. So, in the program, a polynomial is represented by an array of numbers. Putting 
together our pseudo-pseudo code, Newton’s method and Horner’s method into a single program, we have a method 
for finding all the roots of a polynomial: 

Assumptions: p is a polynomial of degree n > 2 and c 1 , the constant coefficient of p, is nonzero. 

Input: array [c] of coefficients of p(x) = ci + C 2 X + C 3 X 2 + ■ ■ ■ + c n + \x n ; tolerance tol ; maximum number of 
iterations N ; initial value Xq- 

Step 1: Set m = n; 
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Step 2: For i = 1 to n — 2 do Steps 3-13: 

Step 3: Set k = 0; Set x = xq\ 

Step 4: While |a: — a:o I > tol or k = 0 do Steps 5-12: 

Step 5: If k = N then return “Method failed. Not all roots found.” 

Step 6: Set xq = x\ 

Step 7 : Set d m - c m +ij z = 

Step 8: For j = m,m — 1, . . . , 2 do Step 9 

Step 9: Set dj_i = Xq dj + Cj ; z = XqZ + dj_ i; 

Step 10: Set y = x^di + Ci ; 

Step 11: Set x = Xo — 

Step 12: Set k = k + 1; 

Step 13: Set r* = x; [c] = [d]; m = m — 1; 

Step 14: Set D = \Jc^ — 4ciC 3; si = — C 2 + D; s 2 = — c 2 — D\ 

Step 15: If the real part of c 2 is negative, then set r n -\ = ^ and r ra 

„ _ 2 ci . 

' n ~ S2> 

Output: Array [r 1; r 2 , . . . , r n ] of approximate roots. 

Steps 4 through 12 implement Newton’s method to find a single root, using Horner’s method in Steps 7 through 10 
to calculte the value of the polynomial and its derivative at Xq. Care is taken to calculate and store the coefficients 
[d] of the quotient for easy referral in Step 13. It is assumed that the square root calculated in Step 14 is the 
principle branch of the complex square root. Steps 14 and 15 utilize an alternate form of the quadratic formula 
that avoids the subtraction of nearly equal quantities so much as possible. 


= else set r n _i = ^ and 


Crumpet 16: Alternate Quadratic Formula 


When the roots of p(x) = ax 2 + bx + c are small, the numerator of the quadratic formula, x = , is 

necessarily small. In this case, it is best to match the signs of —b and ±y/b 2 — 4 ac in order to avoid subtracting 
quantities of nearly equal value. Choosing the sign of the square root term this way gives one of the roots as 
accurately as possible, but leaves the other root undetermined. Multiplying both numerator and denominator 
by the conjugate of the numerator gives an alternate expression of the quadratic formula: 


—6 ± \/b 2 — 4ac —b =p \Jb 2 — 4ac b 2 — ( b 2 — 4ac) 

2a —b =F \Jb 2 — Aac 2 a(—b =F \/b 2 — 4 ac) 

4 ac 

2a(—b^f \/b 2 — 4 ac) 
2c 

-b \Jb 2 - 4ac 

Expanding, we have 

— b + \/b 2 — 4ac 2c 

2 a —b — \/b 2 — 4ac 

and 

— b — \Jb 2 — 4ac 2c 

2 a — b + \/b 2 — 4ac 


However, there is little that can be done at this point if zero happens to be a double root. In this instance, both Ci 
and c 2 will be zero or nearly zero, making both si and s 2 very small. This is why the set of assumptions includes 
the stipulation ci ^ 0. This ensures that zero is not a root of p. 
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Newton’s method and polynomials 

There is one more issue to address regarding the use of Newton’s method for finding roots of polynomials. For a 
polynomial with real coefficients, if xq is real, so will be x±, and x 2 , and every successive iteration! There will be 
no hope of finding complex roots. This is not a problem if the polynomial has at most two complex roots. The 
real roots will be found and the resulting quadratic will hold the two complex roots. The complex roots will be 
uncovered by the quadratic formula. In general, though, we can not count on a polynomial having at most two 
complex roots. Our method should work for polynomials with arbitrarily many complex roots, including the case 
when all roots are complex. 

The fix is not difficult, with one proviso. Mathematically, Newton’s method and Horner’s method work just as 
well with complex numbers as they do with real numbers. As long as the programming language you are using can 
handle complex numbers, just begin with a complex (not purely real) initial approximation xq, and complex roots 
will be found! Even so, it is possible that all the real roots are found first and what remains will be a polynomial 
with more than two complex roots and no real roots. This is where the inaccuracy of floating point arithmetic is 
actually helpful! Neither the coefficients nor the value of Xq will be purely real due to round-off error. The complex 
roots will generally be found. 


Muller’s Method 

Another very fast method for finding roots of equations is Muller’s method . In principle, it is very much like the 
secant method. With the secant method, two initial approximations po and pi are made. The secant line through 
the points ( po , /(po)) and (pi, /(pi)) is drawn and its intersection with the rr-axis gives p 2 . With Muller’s method, 
three initial approximations po> Pi, and, P2 are needed. The parabola through the points (po,/(po)), (Pi,/(Pi)), 
and {jp2 , f (p 2 ) ) is drawn and its intersection with the £-axis gives p 3 . There are a couple of issues to deal with, 
however. First, if the parabola so drawn crosses the :r-axis at all, it crosses it twice. We need to choose one of the 
zeros for p 3 . Second, it is possible the parabola will not cross the x-axis at all. 

Solving the problem of which root to choose is simple. We assume the approximation p 2 is better than the 
others, so we choose the root that is closest to p 2 - Actually, that solves the second “problem” too. Even when the 
parabola does not cross the rr-axis, it has zeros. They are complex. And we do not worry about that. We simply 
take the complex root that is closest to p 2 . This has the nice advantage that even when the coefficients of p( x) are 
all real and p 3 , Pii and, p 2 are all real, and all the roots of p{x) are complex, it will find a complex root. 

As to the business of finding the parabola passing through (po,/(po)), (pi,/(pi)), and (p 2 ,/(p 2 )), we will seek 
a parabola P( x) of the form 

P{x) = a( x - p 2 ) 2 + b(x - p 2 ) + c. 

Making the substitutions x = pi and P{x) = f{pi) leads to the three equations 

/(Po) = a(po - p 2 ) 2 + b(p 0 ~P2) + c 

f(Pi ) = «(Pi - p 2 ) 2 + b(pi - p 2 ) + c 

f{P 2 ) = C 

So we find out immediately that c = /(p 2 ) and we must solve the simultaneous equations 

/(Po ) - /(P2) = a{p 0 - P2) 2 + b(p 0 - p 2 ) 

f{Pi)~f{P2) = «(pi — P2) 2 + b(pi — p 2 ) 

for a and b. The solution is 

b = {Po ^P 2 ) 2 (/(pi) - f{p 2 )) - (Pi ~P 2 ) 2 (/(po) - fip 2)) 

(po P2) (pi P2) (po Pi) 

a = (pi P2) (/ (po) - /(P2)) - (Po P2) if (pi) - /(P2)) 

(Po -P2XP1 -P2XP0 -Pi) 

Now plugging a, b , and c into the quadratic formula gives us roots a ; = p 2 — 5 ±v / b 2_ 4a y • To choose the one closest to 

p 2 , we compare | b+ \/b 2 — 4 ac| with | b— \/b 2 — 4 ac| and use the larger. This gives us the smallest value for \x — p 2 |, 
the distance of the root from p 2 . 
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For example, we will use Muller’s method with p 0 = 1, p 1 = 2, and p 2 = 3 to find a root of f(x ) = x 3 + 1. We 
calculate 


So = f( P o) - /(P 2 ) = 2 - 28 = -26 

= /(Pi) — /(Pa) = 9 — 28 = —19 

= Po~P-2 = -2 

hi = Pi-P2 = -1 

/i2 = Po-Pi = ~l 


so we get C = 28, & = = 4( - 19) _~ 2 1( ~ 26) = 25, 

the graphs of /(a:) and P(i) = 6a; 2 + 25a; + 28 shows 
that P(x) does not have real roots: 


and a = = -i(-26)-(-2)(-i 9 ) _ g A close look at 

n,Qti\ri2 — ^ 

that they do meet three times (at the required points), and 



b ± \/b 2 — 4 ac = 25 ± \/625 — 672 = 25 ± is/ 47. Since |25 + ay 7 47 1 = |25 — a\/47|, it does not matter which root 
we take. Selecting p 3 = p 2 — fe _^ 2 c _ 4a =, we get p 3 = 3 — 25 _ 5 .^^ = i- Continuing this process gives the 

iterates 0.75238 — 0.75810a, 0.57069 — 0.84288a, . . . , 0.50000 — 0.86603a, converging to \ — ^i. 


Crumpet 17: Orders of convergence 


The order of convergence of Muller’s method to a simple root (one that is not repeated) is 


and to a double root, 


f Vi 1 19 

\3\/3 27 


f VI39 _8_^ 3 
\24V3 + 27 


+ 


q [ vhl i 19 
* \ 3V3^ 27 


o fi ( Vl39 , _8_ 

150 ^ 24^3 + 27 


r + l « 1.839286755214161 

3 3 


T + - « 1.233751928528259. 

3 6 


The method of Laguerre converges to a simple root with order 3. 
References [23, 26] 


The following chart summarizes the relative strengths and weaknesses of Newton’s method, the secant method, 
and Muller’s method. 
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Newton’s 

Secant 

Muller’s 

Initial values needed 

1 

2 

3 

Derivative needed? 

Yes 

No 

No 

Order of Convergence 5 

2 

« 1.618 

« 1.839 

Automatic discovery of complex roots? 

No 

No 

Yes 

Simplified in the case of polynomials? 

Yes 

No 

No 


Key Concepts 

Synthetic division: A method for dividing a polynomial p(x) by a monomial (x — Xo) using only addition, multi- 
plication, and the coefficients of p. The process is identical to evaluating a polynomial by nesting. Synthetic 
division simply provides an organizational tool so that nesting can be accomplished simply with pencil and 
paper. 

Horner’s method: A method where the value of a polynomial and its derivative at a single point are calculated 
simultaneously via synthetic division. 

Muller’s method: A root-finding method similar to the secant method where instead of using a secant line a 
parabola is used. 

Deflation: The method of replacing a polynomial p(x) by the product of a monomial (x — Xo) and a polynomial 
q(x) of degree one less than that of the original polynomial. 


Exercises 

1. * • Write an Octave function that calculates the roots 
of a quadratic function using the alternate quadratic 
formula when appropriate. The first line of your func- 
tion should be 

function [rl,r2] = quadraticRoots (a,b , c) 

where rl and r2 are the roots of p(x) = ax 2 + bx + c. 
This way, the values rl and r2 are returned by the 
function in an array. The function is called like this: 

[s , t] =quadraticRoots(l , 2 ,3) , 

setting s to the value of one of the roots and t to the 
other. Test your code well by comparing outputs of 
your function to hand/calculator computations. 

2. * > Write an Octave function that implements Horner’s 
method. The first line of your function should be 

function [p.pprime] = horner(xO,c) 

where c is an array containing the coefficients of the 
polynomial, xO is the number at which to evaluate it, p 
is the value of the polynomial at xO, and pprime is the 
value of the derivative of the polynomial at xO. This 
way, the values p and pprime are returned by the func- 
tion in an array. The function is called like this: 

[y,yy] =horner (-2, [5 , 4,3,2, 1] ) , 

setting y to the value of the polynomial and yy to the 
value of its derivative. Test your code well by compar- 
ing outputs of your function to hand/calculator com- 
putations. 

3. * ■ Write an Octave function that implements New- 
ton’s method with Horner’s method. The first line of 
your function should be 

function x = newtonhorner (c ,x0, tol ,N) 


where c is an array containing the coefficients of the 
polynomial, xO is the initial value, tol is the tolerance, 
and N is the maximum number of iterations before giv- 
ing up. The code should be similar to code you wrote to 
implement Newton’s method before, but this code will 
only work for polynomials. Inside your newtonhorner 
function, DO NOT write Horner’s method code. Just 
call the horner function you wrote in question 2. Test 
your code well by comparing outputs of your function 
to outputs from the code you wrote in question 1 on 
page 71. 

4. * ■ Complete the code for the deflate function begun 
here. 

"/, This function will deflate a polynomial 
7. given a root . 

7, INPUT: coefficients c of the polynomial; 

7, a root r of the polynomial . 

7. OUTPUT: coefficients d of the deflated 
7. polynomial . 

function d = deflate(c.r) 

end7.f unction 

5. " • Write an Octave function implementing Muller’s 
method. 

6. Use Horner’s method/synthetic division to find g( 2) 
and g'( 2). Do not use a computer. 

(a) g(x) = 3x 3 + 12x 2 — 13x — 8 ^ 

(b) g(x) = —7 + 8x — 3x 2 + 5x 3 — 2x 4 ^ 

7. Use Horner’s method to calculate g(— 2) and g'(— 2) 
where g(x) = 4x 4 — 5x 3 + 6x — 7. Do not use a com- 
puter. 

8. Use your work from question 6 to help execute two it- 
erations of Newton’s method using a pencil, paper, cal- 
culator, and Horner’s method/synthetic division. Use 
initial value xo = 2. 
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9. Use your work from question 7 to help execute two it- 
erations of Newton’s method using a pencil, paper, cal- 
culator, and Horner’s method/synthetic division. Use 
initial value Xq = —2. 

10. Compute X 2 of Newton’s method by hand (using 
Horner’s method/synthetic division) for /(*) = x 3 + 
4a; — 8 starting with xo = 0. 

11. Find X 2 of Newton’s method by hand (using Horner’s 
method/synthetic division) for f(x ) = x 4 — 2* 3 —4a; 2 + 
4a: + 4 using xo = 2. 

12. Using Horner’s method as an aid, and not using your 
calculator, find the first iteration of Newton’s method 
for the function /(*) = 2a: 3 — 10a: + 1 using xo = 2. 

13. Demonstrate two iterations of Newton’s method (using 
Horner’s method/synthetic division) applied to /( x) = 
5x 3 — 2ar + 7a; — 3 with po = 1 by hand. 

14. Find all the roots of the polynomial as follows. Use 
Newton’s method with tolerance 10 -5 to approxi- 
mate a root of the polynomial. You may use your 
newtonhorner function from question 3. Then use syn- 
thetic division to deflate the polynomial one degree. Do 
not use a computer for deflation. Then use Newton’s 
method with tolerance 10“ 5 to approximate a root of 
the deflated polynomial. Then use synthetic division to 
deflate the deflated polynomial one degree. Repeat un- 
til the deflated polynomial is quadratic. Once this hap- 
pens, use the quadratic formula (or alternate quadratic 
formula) to find the last two roots. 

(a) g(x) = x 4 + 6* 3 — 59* 2 + 144*— 144 ^ 

(b) g{x) = -280 + 909*-154* 2 - 178* 3 + 54* 4 + 9* 5 

[A] 

15. Find all the roots of the polynomial as follows. Use 
Newton’s method with tolerance 10~ 5 to approxi- 
mate a root of the polynomial. You may use your 
newtonhorner function from question 3. Then use syn- 
thetic division to deflate the polynomial one degree. 
You may use your deflate function from question 4 
for deflation. Then use Newton’s method with toler- 
ance 10~ 5 to approximate a root of the deflated poly- 
nomial. Then use synthetic division to deflate the de- 
flated polynomial one degree. Repeat until the deflated 
polynomial is quadratic. Once this happens, use the 
quadratic formula to find the last two roots. You may 
use your quadraticRoots function from question 1 for 
solving the quadratic. 

(a) g{x) = x 4 — 2x 3 — 12ar + 16* — 40 ^ 

(b) g(x) = 56- 152* + 140* 2 - 17* 3 - 48* 4 + 9* 5 [A) 

16. For each root you found in question 14 except the first 
one, use it as an initial approximation in Newton’s 
method with tolerance 10~ 5 to see if you can refine 
your roots. Do they change? ^ 

17. /(*) = x 3 - 1.255* 2 - .9838* + 1.2712 has a root at 
* = 1.12. 

(a) Use Newton’s method with an initial approxima- 
tion *o = 1.13 to attempt to find this root. Ex- 
plain what happens. 


(b) Find all the roots of /(*). 

18. About 800 years ago John of Palermo challenged math- 
ematicians to find a solution of the equation * 3 + 2* 2 + 
10* = 20. In 1224, Fibonacci answered the call in 
the presence of Emperor Frederick II. He approximated 
the only real root using a geometric technique of Omar 
Khayyam (1048-1131), arriving at the estimate 

How accurate was his approximation? 

Reference [5, pg. 96 ex. 10] 

19. * ■ Calculate the value of the polynomial at the given 
value of * in two different ways, (i) Use your horner 
function from question 2; and (ii) use an inline () func- 
tion. Then (iii) compare the two results using Octave’s 
== operator. 

(a) p(x) = x 4 — 2* 3 — 12* 2 + 16* — 40 at * = \/3 ^ 

(b) q(x) = 56 — 152* + 140* 2 — 17* 3 — 48* 4 + 9* 5 at 

* = 7t/2 ^ 

(c) r{x) =x 6 + llx 4 - 34* 3 - 130* 2 - 275* + 819 at 

1-U5 [AJ 
2 

(d) s(x) = 5* 10 + 3* 8 - 46* 6 - 102* 4 + 365* 2 + 1287 
at - 

e 

20. Write an Octave function that uses your functions from 
questions 1, 3, and 4 to find all the roots of a polyno- 
mial. Test your function well on polynomials of various 
degrees for which you know the roots. You may base 
your function on the pseudo-code on page 84, but your 
code should be significantly simpler since you are call- 
ing functions instead of writing their code. ^ 

21. Use your code from question 20 to find all the solutions 
of the equation. 

(a) * 5 + ll* 4 - 34* 3 - 130* 2 - 275* + 819 = 0 

(b) 5* 5 + 3* 4 - 46* 3 - 102* 2 + 365* + 1287 = 0 

22. Find all the roots of g(x) = 25* 3 — 105* 2 + 148* — 174. 

23. Recall that there are some similarities between the se- 
cant method and Muller’s method. They each require 
multiple initial approximations. They each involve cal- 
culating the zero of some function passing through 
these initial points. They both give superlinear con- 
vergence to simple roots. And, of course, they are both 
root finding methods. Let’s tweak the idea in the fol- 
lowing way. To find roots of g, start as with the secant 
method, using two approximations, *o and *i. Then, 
instead of using the zero of a line through (*o,g(*o)) 
and (*i,gr(*i)), find the function of the form 

h(x) = ax 3 + b 

passing through (*o, p(*o)) and (*i,gr(*i)). Let *2 be 
the zero of h. Then repeat with *1 and *2 to get *3, 
and so on. 
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(a) Let g(x) = 21n(l + x 2 ) — x, xo = 5 and Xi = 6. 
Find X 2 using this method. 

(b) Find a formula for X2 given any function g( x) and 
any initial conditions xo and X\ . Your formula 
should be in terms of xo, xi, g(x o), and g(x i). 

(c) Find a general formula for x n in terms of x n -2, 
Xn-i, g( x„- 2 ), and g(x n -i ))• 

(d) * • Write an Octave function that implements this 
method and prints out each iteration. 

(e) * "Use your Octave function to decide whether 
the order of convergence for this method is linear 
or superlinear. 

24. Pick a function whose root(s) you know exactly. Use 
Muller’s method to find one of the roots. Use three 
consecutive iterations to estimate the order of conver- 
gence. 

25. The errors in three consecutive iterations of Muller’s 
method are shown in the table. Use this information 
to estimate the order of convergence. 


n 

\Xn - X\ 

12 

1.53627(10)“ 34M 

13 

1.67365(10)"^ 

14 

1.83922(10)“ 1131 


26. The graph of f(x) is shown. Find distinct sets of values 
po, p i, and P 2 for which Muller’s method 

(a) will lead to a complex value for P 3 . 

(b) will lead to the root at x « 4.4. 

(c) will lead to the root at x « 2.8. 



27. * • The function shown in question 26 is f(x) = 
X + sin(3:r). Use this information to test your 

conjectures in question 26. 
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2.7 Bracketing 

Bisection is called a bracketed root-finding method. A root is known to lie within a certain interval. Each iteration 
reduces the size of the interval and maintains the guarantee the root is within. At each step of the algorithm, the 
root is known to be between the latest estimate and one of the previous. These bounds form a bracket around the 
root. As the algorithm proceeds, the bracket decreases in size until it is smaller than some tolerance, at which point 
the root is known to be close and the algorithm stops. 

The problem with bisection is its linear order of convergence. Compared to superlinear methods like the secant 
method and Newton’s method, the bisection method just creeps along. But the bisection method has something 
the secant method and Newton’s method do not — certainty of convergence. Yes, the secant method and Newton’s 
method are fast when they converge, but there is no guarantee they will converge at all. 

Methods combining the virtues of the bisection method (guaranteed convergence) and some higher order method 
(speed) are called safeguarded methods. They are guaranteed to converge and can do so quickly when the root is 
near. Any superlinear method may be bracketed, producing a safeguarded method. 


Bracketing 

Bracketing means maintaining an interval in which a root is known to lie. Bracketing is used in the bisection method. 
With each iteration, the root is known to lie between the two latest approximations. Bracketing is not used in the 
secant method nor Newton’s method. There is no guarantee a root remains near the latest approximations. 

It is not difficult, however, to combine the bisection method with the secant method or Newton’s method, or 
any other high order method for that matter, to form a hybrid method where the root remains bracketed and there 
is a chance for fast convergence. In such a method, a candidate for the next iteration is computed according to the 
high order method. If this candidate lies within the bracket, it becomes the next iteration. If the candidate lies 
outside the bracket, the bisection method is used to compute the next iteration instead. 

Bracketed secant method, better known as the method of false position or regula falsi, provides an elementary 
example. In fact, the high order method (the secant method) always produces a value inside the bracket, so checking 
that point is not necessary. Where false position and the secant method differ is choosing which of the previous 
two iterations to keep. In the secant method, it is always the latest iteration which is kept for the next. In false 
position, the latest iteration which maintains a bracket about the root is kept for the next whether that iteration 
is the latest or not. Bracketed Newton’s method provides a slightly more advanced example because it is entirely 
possible an iteration of Newton’s method will land outside the bracket. 

Take the function g(x) = 3 — x — sin(a;) over the interval [2,3]. / is continuous on [2,3], and g{ 2) ss 0.09 and 
g(3) « —0.14 have opposite signs. Thus [2,3] brackets a root of g , so let aq = 2 and aq = 3. The table shows the 
computation of the next iteration for bracketed secant method and bracketed Newton’s method. 

aq aq candidate aq £2 

bracketed secant 2 3 aq — g{x 1 ) g ^y. x g l Xo ) ~ 2.3912 2.3912 

bracketed Newton’s 2 3 aq — 1 \ « —11.101 2.5 

1 s 0 1 ) 

In bracketed secant, the candidate aq is accepted, but in bracketed Newton’s method, the candidate X 2 is outside 
the bracket so it is discarded and X 2 according to the bisection method (2.5) is taken instead. 

To set up the next iteration, g{x 2 ) is calculated. Since g{x 2 ) is negative in both methods, the old aq, which was 
3, is discarded and aq = 2 is “upgraded” to aq in order to maintain the bracket. This way, g has opposite signs at 
aqand aq. The following table demonstrates this decision process plus the computation of the next iteration. 



g(x 2 ) 

aq 

aq 

candidate aq 

aq 

bracketed secant 

-0.073141 

2 

2.3912 

*2 SM « 2-2165 

2.2165 

bracketed Newton’s 

-0.098472 

2 

2.5 

aq 3 V X2) , « 2.0048 

z g'(x 2 ) 

2.0048 


Can you fill in aq based on the values in the following table? Notice the old aq must be “upgraded” in bracketed 
secant but not in bracketed Newton’s. Why? Answers on page 98. 



g( aq) 

aq 

aq 

candidate x 4 

£4 

bracketed secant 

bracketed Newton’s 

-0.015215 

0.087906 

2 

2.5 

2.2165 

2.0048 

*3 9 X% « 2.1565 

7 

7 
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The next 5 iterations of each method are given here in case you would like to try your hand at computing a few. 
And now is a good time to do so. These values were computed using the subsequent Octave code. 



bracketed 


secant 

Newton’s 


2.18062942638407 

2.17925592233708 

x 6 

2.17988957044102 

2.17975682599184 

x 7 

2.17977718322867 

2.17975706647997 

X 8 

2.17976012038625 

2.17975706648003 

Xg 

2.17975753008587 

2.17975706648003 


False position (bracketed secant method) Octave code 


nmmnnnmnnnmnmmmnnmmn 

7o Written by Dr. Len Brin 20 May 2014 7« 

7„ Purpose: Implementation of the Method of / 

"/« False Position. / 

*/. INPUT: function g; initial values a and b; / 

l tolerance TOL; maximum iterations N °/ 0 

7. OUTPUT: approximation x and number of 7 0 

i iterations i; or message of failure 7 0 

rarammnfflnranramrammramn n 


function [x,i] = falsePosition(g,a,b,TOL,N) 
i=l; 

A=g(a) ; 

B=g(b) ; 
while (i<N) 


b 

x=b-B* (b-a) / (B-A) ; 
if (abs(x-b)<T0L) 


return 

end7«if 

X=g(x) ; 

if ( (B<0 kk X>0) I I (B>0 kk X<0)) 
a=b ; A=B ; 
end7»if 
b=x ; B=X ; 
i=i+l ; 
end7«while 

x="Method failed maximum number of iterations reached" ; 

end7«f unction 


Bracketed Newton’s method Octave code 


mmmffimmmmfflnmffljfflm 

/ Written by Dr. Len Brin 20 May 2014 7« 

7, Purpose: Implementation of bracketed Newton’s 7 0 


method . 7 
function g; its derivative gp; initial 7 
values a and b; tolerance TOL; maximum 7 
iterations N 7 
: approximation x and number of 7 
iterations i; or message of failure 7 


function [x,i] = bracketedNewton(g,gp,a,b,TOL,N) 

i=l; 

A=g(a) ; 
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B=g(b) ; 
while (i<N) 
b 

x=b-B/gp(b) ; 

if (x<min( [a,b] ) II x>max( [a,b] )) 
x=b+(a-b)/2; 

end'/oif 

if (abs(x-b)<TQL) 
return 
end%if 
X=g(x) ; 

if ( (B<0 && X>0) I I (B>0 kk X<0)) 
a=b ; A=B ; 
end"/oif 
b=x; B=X; 
i=i+l ; 
end“/ 0 while 

x="Method failed maximum number of iterations reached" ; 

end"/ 0 function 

falsePosition.m and bracketedNewton.m may be downloaded at the companion website. 

The code for bracketed secant method and bracketed Newton’s method are very similar. In fact, they are nearly 
identical. There are only two differences besides the commentary at the beginning. Where bracketed secant has the 
line x=b-B* (b-a) / (B-A) bracketed Newton’s has the line x=b-B/gp(b) ;. This is the essential difference between 
the two as this is where the high order method is executed. The only other difference is that bracketed Newton’s 
includes three lines where it checks whether x lands within the bracket and executes one step of the bisection method 
if not: 

if (x<min( [a,b] ) II x>max( [a,b] )) 
x=b+(a-b)/2; 
end'/oif 

Actually, we could add these three lines to the bracketed secant method and it would run just the same. It is 
impossible for the secant method to produce a value of x outside the bracket, so the bisection step would never be 
executed. The only essential difference between the two functions is the execution of the high order method. 

We can use this observation to create a sort of blueprint for bracketing any high order method. Steffensen’s, 
Muller’s (as long as the approximation stays real), or Sidi’s (section 3.2), for example, can be bracketed this way. 
The following pseudo-pseudo-code represents such a blueprint, giving guidance on how to safeguard a high order 
method by combining it with bisection. 

Assumptions: g is continuous on [a, b\. g(a ) and g(b) have opposite signs. 

Input: Interval [a, 6]; function g\ desired accuracy tol ; maximum number of iterations N ; any other variables, 
like g' in the case of Newton’s method, needed to iterate the superlinear method. 

Step 1: Set A = g(a)-, B = g(b)-, i = 2; 

Step 2 : Initialize any other variables needed for super linear (); 

Step 3: While i < N do Steps 4-10: 

Step 4: Set x = superlinear(a, b, g , . . .); 

Step 5: If (x — a)(x — b) >0 then x = b + 

Step 6: If \x — b\ < tol then return x 
Step 7: Set X = g(x); 

Step 8: If BX < 0 then set a = b; A = B] 

Step 9: Set b = x; B = X; i = i + 1; 

Step 10: Update any other variables needed for superlinear(); 

Step 11: Print “Method failed. Maximum iterations exceeded.” 

Output: Approximation m within tol of exact root, or message of failure. 
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Figure 2.7.1: A troublesome function for the bracketed secant method. 



As motivation for the need to develop bracketed versions of other high order methods, consider the particularly 
problematic function g( x) = It has a root at — but the bracketed secant method can be very slow to 

converge to this root. Figure 2.7.1 illustrates this slow convergence beginning with the bracket [a, b] — [—4, .05]. 
With this unfortunate choice of bracket, the method takes 45 iterations to achieve 10 -5 accuracy. A smarter 
algorithm would not only check that each iterate lands within the brackets, but would also check to see that the 
high order method is making quick progress toward the root. If it detected that convergence was slow, say slower 
than bisection would be, it would take a bisection step instead. Note that bracketed Newton’s method does not 
have a significant problem with this function. Given the same initial bracket, it converges to within 10 _o of the root 
in only 10 iterations (the first 4 of which are bisection steps). Alas, Newton’s method requires use of the derivative. 
A fast bracketed root-finding method that does not require knowledge of the derivative would be quite useful. 

In the early 1970s, Richard Brent built upon the work of van Wijngaarden and Dekker to produce a bracketed 
method that combines bisection, the secant method, and inverse quadratic interpolation, all the while checking 
to make sure the high order method is making sufficiently quick progress toward a root. The result is what is 
now known as Brent’s method [ 3 ]. It does not require knowledge of the derivative. It is fast. It is guaranteed to 
converge. Consequently, it is a popular all-purpose method for finding a root within a bracket when the derivative 
is not accessible. The full details of Brent’s method will not be presented here, but a significant step toward that 
method will. The method presented here is similar to the MATLAB function fzero [22]. 


Inverse Quadratic Interpolation 

You may recall, in Muller’s method, three initial approximations, say a, 6, and, c are needed. The parabola through 
the points (a, 17(a)), (b,g(b)), and ( c,g(c )) is drawn and its intersection with the x-axis gives the next iteration. The 
key elements of this method, the process of fitting a quadratic function to the three points, is called interpolation. 
Thus Muller’s method could just as well be called the “quadratic interpolation method”. 

As you may have guessed, the method of inverse quadratic interpolation is similar. Instead of fitting a quadratic 
function to the points (a,g(a)), (b,g(b)), and (c, 17(c)), the roles of x and y are reversed. A quadratic function is 
fitted to the points ( g(a),a ), (g(b),b), and ( 17(c), c ) instead. Since a; is a function of y in this case, the quadratic 
will cross the x-axis exactly once, when y = 0. Evaluating the quadratic at 0 gives the next iteration. Figure 2.7.2 
shows quadratic interpolation and inverse quadratic interpolation on the same set of three points. In quadratic 
interpolation, y is treated as a function of x. In inverse quadratic interpolation, x is treated as a function of 
y. Inverse quadratic interpolation avoids the main complication of quadratic interpolation — calculating its x-axis 
crossings. In quadratic interpolation, the quadratic may cross the x-axis twice or not at all! Either way, some choice 
needs to be made at every step, and the roots of the quadratic involve the quadratic formula. In inverse quadratic 
interpolation, the quadratic is guaranteed to cross the x-axis exactly once, and finding the crossing is just a matter 
of evaluating the quadratic at 0. That is, y = 0. Remember, the quadratic gives x as a function of y. 

Referring back to the derivation of Muller’s method on page 86, forcing the parabola to pass through the points 
(a, A), ( b , B), and (c, C), and swapping the roles of x and y, a formula for the inverse parabola, q , just falls out: 


q(y) = <?o (y - B ) 2 + q ± (y - B) + q 2 
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where 


92 = b 

(. A - B) 2 (c — b) — (C — B) 2 {a - b) 
qi ~ {A - B){C - B){A- C) 

(C - B)(a — b) — (A - B)(c - b) 

90 “ {A - B){C - B){A - C) ' 


Crumpet 18 : Quadratic interpolation order of convergence 


The method of inverse quadratic interpolation has order of convergence about 1.84 under reasonable assumptions. 
If the function whose root is being determined has three continuous derivatives in a neighborhood of the root, 
the latest three approximations are sufficiently close, and the root is simple, then the order of convergence is the 
real solution of 

3 2 in 

a —a — a — 1 = 0. 

We can use inverse quadratic interpolation to approximate it! 

>> format! ’long’) 

>> f =inline( ’x~3-x~2-x-l ’ ) 
f = f(x) = x~3-x~2-x-l 

>> [res , i] =inverseQuadratic (f , 1 , 2 , 10~-12 , 100) 
res = 1.83928675521416 

i = 8 


The exact solution is 


= 19 
UV3 27 


( vTT , 19 
\3y/3 27 



You may recognize this as the order of convergence for Muller’s method. Indeed, any quadratic interpolation 
method converges to a simple root with this order. 


Reference [ 


The x-axis crossing is, therefore, 


x 


9(0) 

B 2 q 0 — Bqi + q 2 

2 (C~B)(a-b)~(A~B)(c-b) _ ( A-B) 2 (c-b)-{C-B) 2 {a-b ) 

C A-B)(C-B)(A-C ) (A — B)(C — B)(A - C) 

[B 2 {C -B)+ B(C - B) 2 ] (a - b) - [B 2 (A - B) + B(A - B) 2 ] ( c-b ) 

(A-B)(C-B)(A-C) + 

[~B 2 C + BC 2 } (a - b) - [~B 2 A + BA 2 ] {c-b) 

{A- B){C - B){A-C) +b 

BC{C - B){a-b) - BA(A- B){c-b) 

{A- B){C - B){A-C) + 


f (§ - !)(«- b ) ^ ~ f)(c- b) 

(1-f )(§-!)(#-!) 

g(! - f)(c-fe) - f (§ - l)(q- b) 

(f-1 )(§-!)(#-!) 
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Figure 2.7.2: Quadratic and inverse quadratic interpolation. 




To make the compilation of x a little more programmer- friendly, some new variables are introduced. Let 


r = 





so 


r(t + l)(c — b) + s(r + l)(a — b) 

x = b 

rst 


(2.7.1) 


Inverse quadratic interpolation can be bracketed just like any other high order method. But it does present an 
interesting question that not all high order methods do. Three points are necessary for a quadratic interpolation, 
so when they are used to produce the next iteration, a fourth point is generated. Of the four points, the computer 
needs to decide which two will become the next bracket, and which point should be the third needed for the next 
interpolation. But we are getting ahead of ourselves. 

Each iteration begins with three points, (a,g(a)), (b,g(b)), and ( c,g(c )) where a and b bracket a root and c is a 
third point. For the first iteration, only the bracket is given, c is set equal to a. For every iteration, the signs of 
g(a) and g(b) are checked to ensure that a and b bracket a root. If they are opposite, the method proceeds. If they 
are the same, that means g(b) and g(c ) must have opposite signs, so a is set equal to c. Next, the absolute values of 
g(a) and g(b) are checked. If |g(a)| < |g(6)|, the labels of a and b are switched and c is set equal to the new value of 
a. After these initial checks, the computation of the next iteration begins with assurance that a root lies between 
a and 6; b is likely the best estimate of the root to date; and c is likely the worst estimate of the root to date. 

If c = a after the initial checks and possible relabeling, then quadratic interpolation is impossible. The next 
iteration is generated by the secant method (linear interpolation) instead. If c ^ a after the initial checks and 
possible relabeling, a candidate for the next iteration, x, is calculated according to inverse quadratic interpolation. 
If the candidate lies within the bracket, it is accepted as the next iteration. If it lies outside the bracket, a step 
of the bisection method is used instead. In either case, c is set equal to b and b is set equal to x. For bracketed 
inverse quadratic interpolation, this completes one iteration. The method is then repeated until a sufficiently good 
approximation is found. 

In the best-case scenario, inverse quadratic interpolation is used at every step and convergence is super linear 
with order about 1.84. In the worst-case scenario, one of the high order methods is used at every step, but the 
function is pathological and convergence is slow, possibly even slower than bisection. Slow convergence is rare, 
though, and the actual order of convergence can not be pinned down in general. The method switches between 
methods of different orders. The best we can say is it is usually fast. 


Bracketed inverse quadratic interpolation Octave code 

nnmmmmmmmmmmmmmmm 

% Written by Dr. Len Brin 21 May 2014 °i 

% Purpose: Implementation of bracketed inverse i 
7, quadratic interpolation method. 7, 

7o INPUT: function g; initial values a and b; 7« 

7o tolerance TOL; maximum iterations NO 7» 

7o OUTPUT: approximation x and number of 7« 

7. iterations i; or message of failure 7« 

7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 

/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o 

function [x,i] = bracketedInverseQuadratic(g,a,b,TOL,NO) 

i=l; 
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A=g(a) ; 

B=g(b) ; 
c=a; C=A; 
while (i<NO) 
b 

if (B*A>0) 
a=c; A=C; 
end%if 

if (abs(A) < abs(B)) 


c=b ; 

C=B 

b=a; 

B=A 

a=c ; 

A=C 

end%if 


if (a= : 

=c) 


x=(b*A-a*B)/ (A-B) ; 
else 

r=B/A-l ; s=C/B-l; t=A/C-l; 
p=(t+l)*r*(c-b)+(r+l)*s*(a-b) ; 
q=t*s*r ; 
x=b-p/q; 

end'/oif 

if (x<min( [a,b] ) II x>max( [a,b] )) 
x=b+(a-b)/2; 

end'/oif 

if (abs(x-b)<TQL) 
disp(" "); 
return 
end%if 
c=b; C=B; 
b=x; B=g(b) ; 
i=i+l ; 
end“/ 0 while 

x="Method failed maximum number of iterations reached" ; 

end%f unction 

Applying the bracketed inverse quadratic interpolation method to the problematic function g(x ) = over the 

interval [—4, .05] yields the result within 10~ 5 accuracy in only 11 iterations. The method took only 1 iteration 
more than bracketed Newton’s without requiring knowledge of the derivative of g\ bracketedlnverseQuadratic ,m 
may be downloaded at the companion website. 


Stopping 

In all of our root-finding methods, the algorithm stops when the difference between consecutive iterations is less 
than some tolerance. This criterion is based on the assumption that the error will be no more than this difference. 
And that is a safe assumption for any method that is converging super linearly when it quits. Indeed, it is even 
safe for the linearly converging bisection method where the difference between consecutive iterations is exactly the 
theoretical bound on the error. 

The criterion is not safe when a superlinear method is used far enough from a root that superlinear convergence 
is not observed. This is exactly what happens in figure on page 94. The difference between consecutive iterations 
is actually larger than the absolute error at every step. This is an unusual situation, but it can happen. 

The criterion is also not safe when a method is linearly convergent with a limiting convergence constant A > |. 
However, linearly convergent methods should never be used on their own as there is always a faster alternative. 

There is one more important consideration regarding stopping. Stopping when the difference between consecutive 
iterations is less than some tolerance is dependent on the absolute error. When roots could be very small or very 
large, it is perhaps better to use a criterion based on relative error. Instead of stopping when \x n+ i — x n \ < tol , for 
example, we would instead stop when |cc n +i — x n \ < tol ■ |:r n +i|. 


98 


CHAPTER 2. ROOT FINDING 


Key Concepts 

Bracketing: Iteratively refining an interval, also known as the bracket, in which a root is known to lie until it is 
small beyond some tolerance. 

Inverse quadratic interpolation: A quadratic in y is fit to three consecutive approximations of a root. The 
intersection of the quadratic with the x-axis becomes the next iteration. 

Bracketed secant method: A combination of the secant method and bisection method employing bracketing. At 
each iteration, if the secant method produces a value inside the current bracket, it becomes the next iteration. 
Otherwise bisection is used to produce the next iteration. 

False position: Another name for the bracketed secant method. 

Regula falsi: Another name for the bracketed secant method. 

Bracketed Newton’s method: A combination of Newton’s method and the bisection method employing brack- 
eting. At each iteration, if Newton’s method produces a value inside the current bracket, it becomes the next 
iteration. Otherwise bisection is used to produce the next iteration. 

Bracketed inverse quadratic interpolation: A combination of inverse quadratic interpolation, the secant method, 
and bisection employing bracketing. At each iteration, if inverse quadratic interpolation produces a value in- 
side the current bracket, it becomes the next iteration. Otherwise either the secant method or bisection is 
used to produce the next iteration. 


Exercises 

1. Use the bracketed secant method (false position) to find 
a root in the indicated interval, accurate to within 1 CU 2 . 

(a) f(x) = 3 — x — sin*; [2, 3] ^ 

(b) g(x) = 3x 4 — 2x 3 — 3* + 2; [0, 1] 

(c) g(x) = 3x 4 — 2x 3 — 3x + 2; [0, 0.9] ^ 

(d) h(x) = 10 — cosh(*); [—3, —2] 

(e) /(f) = V4 + 5sinf - 2.5; [-600, -500] [A] 

(f) 5 (i) = 3^*; [3490,3491] 

(g) h(t) = In (3 sin t) - f ; [1,2] 

(h) /(r) = e sinr — r; [- 20 , 20 ] I s ] 

(i) g(r) = sin(e r ) + r; [-3,3] 

(j) h(r) = 2 sinr — 3 COST ' ; [ 1 , 3 ] W 

2. Repeat question 1 using bracketed Newton’s method. 

[S] [A] 

3. Repeat question 1 using the secant method. Compare 
your answer with that of false position. ' -1 -J 

4. Repeat question 1 using Newton’s method. Compare 
your answer with that of bracketed Newton’s method. 

[S] [A] 

5. * • Repeat question 1 using Octave and a tolerance of 

IQ— 6 [s] [A] 

6 . * • Repeat question 2 using Octave and a tolerance of 

IQ— 6 [S][A] 


7. 


* • Repeat question 1 using Octave, bracketed inverse 
quadratic interpolation, and a tolerance of 10 -6 . 1 


8. Compare the results of questions 5, 6, and 7. ' 

9. * • Write a bracketed Steffensen’s method Octave func- 
tion. REMARK: Steffensen’s method is a fixed point 
finding method. It solves the equation /(*) = x, not 
/(*) = 0. So a proper bracket [a,b\ is one for which 
(/(a) > a and f(b) < b) or (/(a) < a and f(b) > 
b). Geometrically, this means the points (a, /(a)) and 
(6, /(&)) are on opposite sides of the line /(*) = x, anal- 
ogous to a root-finding bracket where the two points are 
on opposite sides of the line /(*) = 0. 

10. ° o Use your code from question 9 to repeat question 
1 using Octave, bracketed Steffensen’s method, and a 
tolerance of 10 -6 . Given that you are looking for a root 
of g{x), use /(*) = g{x) + x in your call to Steffensen’s 
method. P ][A] 


11. Compare the results of questions 7 and 10. 

12. * * Rewrite the inverseQuadraticInterpolation Oc- 
tave function so that it stops when the (approximated) 
relative error is less than the tolerance. 

13. * »Use your code from question 12 to repeat question 

1 with a tolerance of 10~ 6 . 1 


14. Compare the results of questions 7 and 13. 


Answers 

X 4 : In both methods, the candidate X 4 is accepted since in each case, X 4 is within the bracket formed by x 2 and 
* 3 . So, for bracketed secant, X 4 = 2.1854, and for bracketed Newton’s, X 4 = 2.1565. X\ is upgraded to x 2 in 
bracketed secant because g{x 3) is negative. g{x 2) and g[x 3) must have opposite signs in order to maintain 
the bracket. x\ is not upgraded in bracketed Newton’s because g(x 3 ) is positive. 
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Interpolation 


3.1 A root-finding challenge 

We open this chapter by combining its content with that of the previous chapter. In the present chapter, we will 
discuss interpolating functions (functions whose graphs must contain a prescribed set of points) and interpolation 
(the exercise of finding such a function). In the previous chapter, we discussed approximating roots of functions by 
numerical computation. Putting these ideas together in the present section, we present an interpolating function, 
which we will call /, and challenge the reader to find all 6 roots of /, /', and a particular antiderivative of / as 
accurately and efficiently as possible. Graphs of the three functions and the definition of / follow. Should you 
accept the challenge, be prepared to use all of what you know about root-finding and Octave. This problem is not 
easily solved! 

If you would like to get right to it, you can skip most of the content of this section. Use the three graphs and 
the Octave code as a starting point to find the roots of F, /, and, /'. The rest of the material is here to help you 
understand the definition and construction of the functions, but is not prerequisite to taking the challenge. 

The function / and its antiderivative 

The function 


3 



which we will call F, could easily be mistaken for a cubic or higher degree polynomial, but it is far from so nice. 
First, its domain is the interval [0, 1], so the graph shown is the entire graph. Second, it has but two derivatives. 
Third, its definition is a touch unusual. More on that soon. 

What we have here is the antiderivative of a fractal interpolating function. An interpolating function is a function 
that contains a set of prescribed points. This one happens to be fractal in nature, thus a fractal interpolating 
function. The fractal interpolating function, /, passes through 

(0, .123), (.33,-123), and (1, .5) (3.1.1) 
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in such a way that the graph shown is that of its antiderivative. The unusual nature of the definition of F is derived 
from the unusual nature of the definition of /: 


where 


fi + ci f + dif (§) , 0 < x < a 

/2 + cafEf +d2/(fEf), «<x<l 


h 


h 


8979 „ _ 34779 

100000’ Cl “ ”100000’ 
75891 317391 


550000 ’ C2 550000 ’ 
33 


a = 


100 ' 


d\ 

d 2 


27 

loo 

67 

550 


Crumpet 19: Fractal Interpolating Functions 


Fractal interpolating functions are not restricted to passing through three points. Actually, three is the minimum. 
In general, for n > 3, suppose xi < X 2 < • • • < x n - The linear fractal interpolating function (there are other 
types of fractal interpolating functions) passing through each of the points 


{xi,yi),(x 2 ,y 2 ), ■ ■ ■ ,{x„,y n ) 

and having domain [xi , x n ] is defined by the linear transformations 



The at, a, a, and /; are calculated based on the requirement that the function interpolate the given points. In 
particular, we require 



Xi 


Vi 


and Li 


Vn 


1 

Vi + 1 


The di are free parameters with the restriction \di\ < 1. It is a straightforward algebraic exercise to show 


ai 


Ci 


ei 

fi 


Xj+l - Xj 
Xn - Xl 

3/j+i ~ Vi - dj{y n - yi) 
X n - Xl 
Xi — OiXl 
Vi — dx i — dtyi. 


In concert, the Li define the function /, each Li responsible for the subset [xt,Xi+i] of the domain. 


Li 


a-tx + a 
ax + diy + fi 


, so as Li takes x to aix + a, it simultaneously takes y to ax + diy + fi. 


Noting that Li takes this action on the function /, we must have that f(fliX + a) = ax + dif(x) + fi on [x \ , *„] , 
or equivalently, 

fix) = fi + a (- — ) + dif — \ on [xi,x i+ 1 ]. 

\ di / \ di / 


Putting all the pieces together, / is defined by 


fix) = 


' hiix), 

h 2 ix), 


xi < x < X 2 
X 2 < X < X3 


where 


,hn-lix), Xn-l < X < Xn 

hiix ) = fi + a + dif — e ’ ^ . 
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Consequently, F(x) = f* f(t) dt is defined by 


F{x) = < 


II i hi(t)dt, 

F{x 2 ) + fl 2 h 2 (t)dt, 


X\ < X < X2 
X 2 < X < X3 


F(x n -l) + fl ri _ 1 h n - 1 (t)dt, Xn-1 <X<X. 


without qualification, and f'(x) is defined by 


{ h[(x), Xl < x < X2 

h’ 2 (x), X2 < x < X3 

. 

h'n-l(x), Xn-1 < X <X n 


as long as /' exists! If | A | <1 for all i , then the derivative will exist almost everywhere, but will generally be 
discontinuous. If we also have h[(xi+i) = h! i+ i(xi+ 1 ) for all i s= 1, 2, . . . , n — 2, then the derivative will exist and 
will be continuous. 


Reference [2, Chapter 6] 


The definition of / is self-referential. Its values are defined by, among other terms, values of itself! This makes 
evaluating the function a bit different from evaluating a typical function. For example, by virtue of the fact that / 
passes through the points 3.1.1, we must have /( 0) = .123, /(. 33) = —.123, and /( 1) = .5, facts we can check easily 
enough. According to the definition, 


/( 0) = fi + di/(0) = .08979 + .27/(0) 


so /( 0) is defined in part by itself. We need to solve the equation /( 0) = .08979 + .27/(0) to find /(O). Thus we 
have /( 0) = ' 08 7 9 3 79 = .123, as promised. Again according to the definition, 


/(l) = /2 + c 2 + d 2 f{ 1) = 


75891 

550000 


317391 

550000 


67 

550 


/(!)• 


Solving for /( 1), we have /( 1) 


75891 

550000 

1 - 


1 317391 
550000 

~S7~ 

550 


1, as promised. Since a 


ways to calculate /(. 33). According to the first part of /, 


.33, the definition actually gives two 


/(■ 33) = f(a) 


fi + Ci + di/(l) 

8979 34779 | 27 1 

100000 " 100000 + Too ' 2 

-.123. 


Now is a good time to verify that /(a) = —.123 according to the second part of / as well. Try it! Calculating other 
values of / can be a bit more challenging, but there are still a few that are not so bad. a 2 < a and a + (1 — a)a > a , 
so 


/(a 2 ) = fi+cia + ckf(a) 

8979 34779 33 27 / 123 \ 

100000 - 100000 ' Too + Too ' v 1000 ) 

= -.0581907 

f(a + (1 - a)a) = / 2 + c 2 a + d 2 f (a) 

75891 317391 33 67 / 123 \ 

_ 550000 + 550000 ' 100 + 550 ' V 1000 ) 
2060703 
55000000 
= .037467327 
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With a similar level of difficulty, you can now calculate 

/(a 3 ), /(a(a + (1 - a) a)), /(a + (1 - a)a 2 ), 
and f(a + (1 — a) (a + (1 — a)a)). 


Answers on page 105. More generally, once you have calculated f(x) for some value x, you can then calculate f(ctx) 
and f(a + (1 — a)x) from it. 

Now that we have a handle on /, we define F by F(x) = f(t ) dt for all x € [0, 1]. Integrating /( x) we have 


F(x) 


hx+^+ad.F^), 

F{a) + f 2 ( x -a)+ + (1 - a)d 2 F (fff) , 


0 < x < a 
a< x <1 


where again both formulas are applicable when x = a. Just like /, F is self-referential. We must go through the same 
process in finding values of F as we did finding values of /. To get started, F( 0) = ad±F(0) => (1 — ad±) ■ F( 0) = 0, 
but a and g?i are both less than 1, so 1 — adi y 2 0. Therefore, 


m 


0 

1 — adi 


= 0 . 


We could have computed this value by integration just as well: F( 0) = J 0 ° f(t) dt = 0. Now, according to the 
formula, 

F(l) = F(a) + (1 - a) (/ 2 + y + d 2 i r (l)) 
and 

F(a) = a ^/i + — + diE(l)^ , 

a system of two equations in the two unknowns, F(a) and F(l). Its solution is 


F(q) = 

E(l) 


121012947 

6081400000 

5361861 

60814000 S 


« -.01989886325517151 
0.0881682014009932. 


Now that we have the few values, E(0), F(a), and E(l), we can calculate others as before. The values F(ax ) and 
F(a + (1 — a)x) will both depend on the value of F(x). So we can compute F(a 2 ) and F(a + (1 — a) a): 


F(a 2 ) 


F(a + (1 — a) a) 


3 

ha 2 + + adlF ( a ) 

10678194456039 


6081400000000000 

.001755877668964219 

F(a) + f 2 (1 — a) a + — — — 1- (1 - a)d 2 F (a) 

94196657189979 


3040700000000000 

-.03097860926430723. 


Now you can calculate F(a 3 ), F(a(a + (1 — a)a)), F(a + (1 — a)a 2 ), and F(a + (1 — a) (a + (1 — a) a)) yourself. 
Answers on page 105. You shouldn’t worry about calculating these values exactly. That would require a computer 
algebra system with arbitrary precision and is not really the point. The point is to make sure you understand how 
to do the calculations. Use a calculator or Octave and the approximate values already calculated. 


The derivative of / and more graphs 

The function / has a continuous derivative. In fact, the parameters defining / were specifically chosen so the 
derivative would exist and be continuous. Differentiating / gives us 


f'(x) 


a + rfi /'(£) 0<x<a 

Y 22 - + f ( , a < x < 1 

1— a 1 —ot J V 1— a J ' — — 
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Figure 3.1.1: Graph of /. 



Figure 3.1.2: Graph of /'. 



and we can check as before that the definition is consistent when x = a: 

11593 


/'( 0 ) = - + -/'( 0 ) =► /'( 0 ) = 

a a a — ai 


/'( i) = 


C2 


+ 1) =* /'(l) = 


C2 


2000 

105797 


1 — a 1 — a 


ci , 141949 


/'(a) = + -±/'(l) = - 

a a 737000 


1 - a - d 2 100500 

-.1926037991858887 


= -5.7965 

« 1.052706467661692 


/'(«) = 


C2 


1 — a 1 — a 


d 2 141949 

737000 


+ = - 


-.1926037991858887. 


Other values of /' can be computed as done for / and F. The graphs of / and /' are shown in Figures 3.1.1 and 
3.1.2. 

That’s it. Now see if you can find the roots of the three functions. The following Octave code will help you 
evaluate the functions at any points, a real time saver! 


Octave 


mmmmmmmmnmmmmmmmmnmm 


/ Written by Dr. Len Brin 19 February 2014 7, 
°L Purpose: Calculate values of the fractal interpolating °L 
°i function, f, passing through 7 0 
1 (0,f_0), (alpha, f_alpha) , and ( 1 , f _ 1 ) , °/„ 
7 0 its derivative and its integral. 7 0 
1 INPUT: value at which to evaluate, x; array of values, °L 
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"/, f = [f _0 , f _alpha, f _1] ; alpha; scaling factors "/, 

/ dl and d2. '/. 


1 OUTPUT : y=f 1 (x) ; yy=f (x) ; yyy=F (x) . 


I 


mmnnmmmmmmmmmmmmmmmm 


function [y,yy>yyy] = fractalInterpolator(x,f , alpha, dl,d2) 
fl=f (l)*(l-dl) ; 
cl=f (2)-dl*f (3)-f 1; 
f 2=f (2) -d2*f (1) ; 
c2=(l-d2)*f (3)-f 2; 

Fl= (alpha* (f 1+c 1/2)+ (1 -alpha) * (f 2+c2/2) ) / ( 1- ( 1 -alpha) *d2-alpha*dl) ; 
FA=alpha*(f l+cl/2+dl*Fl) ; 

1 = 0 ; 
r=l; 
a= [] ; 

if (alpha>l/2) 

its=f loor (log(10~-16)/log(alpha) ) ; 
else 


its=f loor (log (10~-16) /log (1-alpha) ) ; 
end%if 


for i=l:its 

if (alpha>l/2) 
h = (r-l)*alpha; 
m = 1+h; 
else 


h = (r-l)*(l-alpha) ; 
m = r-h; 
end“/ 0 if 
if (x<m) 
a(i)=0; 


r=m; 

else 

a(i)=l ; 
l=m; 

end"/oif 

end“/ 0 f or 
x=0 ; 

y=cl/ (alpha-dl) ; 
yy=f (1) ; 
yyy=o ; 

for i=its : -1:1 
if (a(i)==0) 

y=(cl+dl*y) /alpha; 
yy=cl*x+dl*yy+f 1 ; 
yyy=alpha*(f l*x+cl/2*x*x+dl*yyy) ; 
x=alpha*x; 
else 

y=(c2+d2*y)/ (1-alpha) ; 
yy=c2*x+d2*yy+f 2 ; 

yyy=FA+(l-alpha)*(f2*x+c2/2*x*x+d2*yyy) ; 
x=alpha+ (1-alpha) *x ; 

end'/oif 

end%f or 
end%f unction 


fractallnterpolator .m may be downloaded at the companion website. 
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Answers 

Evaluating /: The following are a few values of /: 

/(a 3 ) « .03620418000000000 

f{a{a + (1 - a) a)) « -.09176089063636364 

/(a + (1 — a)a 2 ) « -.08222890363636364 

f(a + (1 - a) (a + (1 - a) a)) « .1846063473223140. 

Evaluating F: The following are a few values of F: 

F(a 3 ) « .002702687013731212 

F(a(a + (l-a)a)) « -.003859289400223274 

F(a + (1 - a) a 2 ) « -.02753062961856850 

F(a + (1 — a) (a + (1 — a) a)) « -.01466250212441314. 
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3.2 Lagrange Polynomials 


A function that is required to have a graph passing through some set of prescribed points is called an interpolating 
function, and we say that such a function interpolates the prescribed points. Further, the exercise of finding such 
a function is called interpolation. 

In exercise 3a of section 2.5, you are asked to find a polynomial with roots at —7, 2, and 1 ± 5i (and no others). 
The function, therefore, must be a polynomial and have a graph passing through the points 

(-7,0), (2,0), (1 + 5z,0), and (1 — 5i, 0). (3.2.1) 


In retrospect, then, the question could have been phrased as: find a polynomial passing through the points 3.2.1 
(and not having any roots besides —7, 2, 1 + 5 i, and 1 — 5*), a question of interpolation. We now expand upon this 
idea by considering polynomials with graphs passing through points with arbitrary ordinates (not just 0) . 

We start on familiar ground. The polynomial p(x) = (x + 7)(x — 2) has roots —7 and 2 so has a graph passing 
through (—7,0) and (2,0). Suppose we want to modify p so it also passes through (—1,1). That is, we want 
p(— 7) = 0, p(— 1) = 1, and p( 2) = 0. Beginning with p(x) = (x + 7)(x — 2), we already have p(— 7) = 0 and p( 2) = 0, 
so really we only need to concentrate on p(— 1) = 1. As is, p(— 1) = (—1 + 7)(— 1 — 2) = 6(— 3) = —18, a far cry 
from 1. But p(x) = (x + 7)(x — 2) is not the only polynomial passing through (—7,0) and (2,0). Let a be any 
real number and note that q(x) = a(x + 7)(x — 2) also passes through (—7,0) and (2,0). If we choose a such that 
</(—!) = 1, we have the desired function: 


«(-l) 


a(— 1 + 7)(— 1 - 2) = -18a = 1 => a = - 


1 

18' 


q(x) = — j^(x + 7)(x — 2) passes through all three of the points, (—7,0), (2,0), and (—1,1). But let us not lose 
sight of whence this came. — jg = so, actually, the desired function can be written as q(x) = Indeed, 

<?(- 7 ) = §FTj = °> = Fpi) = and «(-!) = = L 

Now suppose we want a polynomial passing through (—7,0), (2,0), and (—1,^2)- As before, we know p(x) = 
(x + 7)(x — 2) has the desired roots and q(x) = has the nice feature that q(— 1) = 1. We use these two facts 
to come up with an answer. In fact, without doing any calculation, we know the polynomial 


is the desired function. Take a moment to check that l(— 7) = 0, 1(2 ) = 0, and l(— 1) = %/ 2, and understand its 
construction. This idea is the seed for what is called the Lagrange form of interpolating polynomials. 

We are now ready to let the ordinates fly! Suppose we would like a polynomial passing through (— 7, j/i), 
( 2 , 2 / 2 ), and (— 1 , 2 / 3 ). We know the polynomial pz(x) = (x + 7)(x — 2) has zeros at —7 and 2, so the polynomial 
h(x) = 2/3 has zeros at —7 and 2 and, conveniently, hi—l) = 2 / 3 - This is a good first step. It has the correct 

ordinate at —1 and zeros at —7 and 2. Similarly, we can construct the polynomial P 2 (x) = (x + 7)(x + 1) with 
zeros at —7 and —1, from which we can construct the polynomial / 2 (;r) = 2/2 with zeros at —7 and —1 and, 

conveniently, l 2 ( 2) = y 2 . This is a good second step. It has the correct ordinate at 2 and zeros at —7 and —1. Now 
consider the sum (?3 + Z 2 )- h(— 1) = 2/3 and Z 2 (— 1) = 0, so (Z 3 + Z 2 ) ( — 1) = 2 / 3 - Similarly, Z 3 (2) = 0 and Z 2 ( 2) = 2 / 2 , so 
(h + h)(2) = 2 / 2 - Moreover, (l 3 + Z 2 )(— 7) = 0. We now have a polynomial passing through two of the three required 
points and having a zero at the abscissa of the third point. If we had a polynomial with the correct ordinate at —7 
and zeros at 2 and —1, we could add it to the sum and be done. But this is exactly the type of polynomial we have 
been constructing! We let pi(x) = (x + \)(x — 2) and l\(x) = and n °t e that l\ has the correct ordinate at 

—7 and zeros at 2 and —1, just as we needed. Finally, the desired polynomial is (l± + l 2 + £ 3 ). Table 3.1 summarizes 
the construction. 

And now we are ready for complete generalization. Suppose n > 1 and xq, xi, . . . ,x n are n distinct real numbers. 
We use the notation P n (x) for the polynomial of least degree interpolating the points 


(zo, 2/o), (xi,yi), ...,(:r„, //re- 


setting Pi(x) = JJ(£ — Xj) = (x — Xq) ■ ■ ■ (x — Xi-\)(x — Xi+i) • • • (x — x n ), one formula for P n is 

i=o 


L n(x) = ^ 
i = 0 


Pi(x) 

Pi(Xi) 


(3.2.2) 
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Table 3.1: A polynomial passing through (— 7,yi), (2, y 2 ), and (— l,y 3 ). 


X 

Sj) 

II 

Mz) - 2)2/2 

; _ P 3 O) 

t-zyx) — p 3 (_i) 2/3 

(/1 + h + h)(x) 

-7 

2/i 

0 

0 

2/i 

2 

0 

2/2 

0 

2/2 

-1 

0 

0 

2/3 

2/3 


As written, L n is called the Lagrange form of P n . For sake of brevity, it is often called the Lagrange interpolating 
polynomial, or even Lagrange polynomial. However, the interpolating polynomial of least degree by any other name 
would be but P n . We will adhere to the practice of calling it the interpolating polynomial of least degree, or use 
the notation P n , when the form is unimportant and will add the phrase Lagrange form , or use the notation L n , 
when it is. 

The main use for interpolating polynomials in numerical analysis is to approximate non-polynomial functions in 
the following way. Suppose we know the value of / at a selection of points. That is, we know f(xo) = yo, f(x i) = 
Xi, . . . , f(x n ) = y n and perhaps not much more. The interpolating polynomial of least degree passing through the 
n + 1 points 

(x 0 ,yo),(x 1 ,y 1 ),...,(x n ,y n ) 

will, by construction, agree with f &t x-o,xi, ... ,x n and we can say with some precision how closely this interpolating 
polynomial agrees with / at other points as well. The values of the interpolating polynomial at these “other points” 
are what we refer to as approximations of the non-polynomial function. 

Setting a = min(a;o, . . . , x n , x) and b = max(:ro, . . . , x n , x), we have the following result. If / has n + 1 derivatives 
on (a, b) and / (n) are all continuous on [a, b], then there is a value £ (a, b) such that 


f{x) - P n {x) 


f in+1) (^)r 

— [X - X 0 )(x - X!) ■ ■ ■ [X - X n ). 

( n + lj! 


(3.2.3) 


Ironically, this result is proven by considering the Lagrange form of an interpolating polynomial in t that is equal 
to the error at x and equal to zero at each Xi. That polynomial is 


A (t) 


[Pn{x) - f(x)} 


{t - X 0 ){t - Xj) • • ■ (t ~ X n ) 

f x - x 0 )(x - xf) • • • (x - x n ) 


Crumpet 20: A 


A is the (capital) eleventh letter of the Greek alphabet and is pronounced lam-duli. The lower case version, A, 
appears much more commonly in mathematics and often represents an eigenvalue. 


Subtracting this polynomial from the error, eft) = P n {t) — f(t), we have a function, 

g (t) = eft) - A (t), 

that is zero for all t = Xq , x ±, . . . , x n , x. Since g, g ' , . . . , are all continuous on [a, b] and g( n+1 '> exists on (a, 6), 
by Generalized Rolle’s Theorem, there is a value £ x £ (a, b) such that f/b l + 1 ) (£ x ) = 0. On the other hand, 

g {n+1) (U = e^to-A^ 1 ^) 

= P^ n+1 \^) - / (n+1) fe) ^ A(" +1 )(^), 

and P n is a polynomial of degree at most n. Hence, P„ n+1 \f) = 0 for all t and we have <?( n+1 H£z) = — f^ n+1 \f, x ) — 
AO+b^) = 0. It follows that 

f (n+1 \f x ) = - A (n+1) fe). 
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But, A is a polynomial of degree n + 1 in t, so its (n + l) st derivative with respect to t is constant with respect to 
t. We write A as 


A(i) = 


Pn(x) - f(x) 

( X - x 0 )(x - #i) • • • (x - X n ) 


j.n+1 


bj n 


k +0 


t°] 


for some constants b n , b n - 1 , . . . , 6q, and consequently, 


A ( n+1 \t) 


Pn(x) - f{x) 

(x — Xq)(x — Xi) ■ ■ ■ {x 


X n ) 


■ (n + 1 )!, 


and we have, by substitution, 


/ (n+1) fe) 


f(x) - P n {x) 

(x — Xq) (x — Xi) • • • (x 


X n ) 


■ {n + 1)! 


or, equivalently, 


f (n+1 \i) 

(n + 1)! 


(x 


x 0 )(x - xi) • • • (x - x n ) 


f(x) - P n (x) 


as desired. 

Figure 3.2.1 shows interpolating polynomials for three different functions. The x-coordinates of the prescribed 
points are the same for each interpolating polynomial. The x-coordinates are 


0, .1951846177977887, .3554400571592862, .4823905248516196, .9138095996128959, and 1. 


The four numbers between 0 and 1 were selected by a random number generator. The interpolating polynomial 
closely resembles the function only in the first case. The sixth derivative of / helps explain why. 

Our error term, 

- ~ x o)(x ~ xi) • • ■ (x - x 5 ) 

implies that the sixth derivative of / and the polynomial h(x) = O-^oX-t-Kip-U^s) { ] e t erm in e i 10w much / and 
L 6 will differ. By bounding both |/( 6 ) | and \h\ over the interval [0, 1], we can get a bound on the difference between 
/ and Lq. The graphs of are shown in Figure 3.2.1. The graph of h is 



so max 2 . e r 0jl i \h(x)\ occurs around 0.75. We can use a root-finding method applied to h' to find that the maximum 
of \h\ is approximately h(. 7409254943919) « 2.506891519629(10) -6 , a relatively small number. On the other hand, 

for /(x) = e sln (^ +1 ^ ), we find max 2 . S [ 0 ,i] |/^(x)| ~ (.6777170541644) « 44013.74605321, a relatively large 

number. Their product, 


max \h.(x)\ ■ max 
xG [0,1] xe[0,l] 


f (6) (x) 


. 11 , 


gives a bound on the error. The absolute furthest L e can be from / over the interval [0, 1] is 0.11, a relatively small 
number. The actual error is considerably smaller, so can barely be noticed in the top left graph of Figure 3.2.1. 
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Figure 3.2.1: Three interpolating functions. From top to bottom, e sin (( 3: + 1 ) ) ; s j n f e (x+i) j , and a fractal function 
as defined in section 3.1. / is shown in black and the interpolant, Lq, in red. 

f(x) and L 6 ( x) f (6 \x) 







Undefined 
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For f(x) = sin , we find max^pq] \f( 6 \x)\ « /^ 6 ^(1) ~ 8.552147927657737(10) 13 , a relatively large 

number. This time the product, 


max \h.(x)\ ■ max 

xe[0,l] ®G[0,1] 


/ (6) (*) 


2. 14393071 14460004(10)®, 


is a huge number relative to the values of /. So the theoretical error bound does not predict good results for 
this interpolation. In fact, it suggests that the interpolation could have been much, much worse! L e might have 
differed from / by over 2 million, a fact that should be worrisome considering / takes values between —1 and 1. An 
approximation that is off by even 1 is completely useless for this particular /. As it is, we should not be surprised 
that Lq is not a good approximation of / since the error term can be quite large. Nonetheless, the method is sound. 
Failure to approximate / well should not be seen as a flaw in the method, but rather a flaw in its application. If 
we really wanted to approximate / well, we would need to find a different set of points over which to interpolate. 

For the fractal function in the bottom left of Figure 3.2.1, our error estimate is entirely irrelevant. The sixth 
derivative of / does not exist. In fact, even the first derivative of / does not exist. We have no way to estimate 
the error except to look at the graphs. And as we see, Lq again does a very poor job of approximating /. Failure, 
again, should not be seen as a flaw in the method, but rather in its application. Approximating a function with an 
interpolating polynomial presumes that the function has sufficient derivatives. 


Crumpet 21: Bernstein polynomials 


Suppose / is a continuous function on the interval [0, 1], and define the polynomial 


Then 


B n {x) = Y C'f] f ( r - ®) n n=l, 2, 3, 

n \ ' 


lim B n (x) = f(x) 

n—> oo 


uniformly. That is, linin-^oo max{ | (a;) — f(x) \ : x £ [0, 1]} = 0. The B n are Bernstein polynomials. Shown 
below are B 4 , B 20 , B 100 , and B 500 for the fractal function in figure 3.2.1. 



An application of interpolating polynomials 

Again we find ourselves connecting the content of the previous chapter with that of the current. The secant method 
is actually an application of interpolating polynomials to root-finding. The secant line whose slope is used to 
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calculate any given iteration can be viewed as an interpolating line! It passes through two points lying on g. Hence, 
it is an approximation of g. 

Having taken this point of view, we can now imagine generalizing the method by using the derivative of a higher 
degree interpolating polynomial to approximate g' at each step. Such a generalized method, which we will call 
Sidi’s k th degree method [30], is summarized by the formula 


^n+1 — %n 


g{x n ) 

Pn,kM 


where p n ^k is the interpolating polynomial passing through the points 


{Xn > ^?(*£n)) 5 ( x n— 1 > gi^Xn — l)) 5 • • * 5 (Xn— k : fc)) • 


When k = 1, this is exactly the secant method. When k = 2, this method uses the same parabola as does Muller’s 
method, but in a different way. In Muller’s method, the next iteration is found by locating a root of the interpolating 
polynomial. In this method, the next iteration is found by locating a root of a tangent line to the interpolating 
polynomial. 

As k increases, more initial values are needed, but the order of convergence increases as a benefit. Letting cq c 
be the order of convergence of Sidi’s k th degree method, we have ot\ = 1+ ^ ss 1.618, the order of convergence of 
the secant method, and 

a 2 ~ 1-839, 03 « 1.928, 04 ~ 1.966. 

For any k, Sidi’s method has an order of convergence less than 2 (the order of convergence of Newton’s method) 
but it approaches 2 as k increases. 

At this point, you might wonder just how practical such a method might be. After all, calculating a new 
Lagrange interpolating polynomial and evaluating its derivative at each step can be a cumbersome process. We will 
take up this issue in the next section. 


Neville’s Method 


The Lagrange form of an interpolating polynomial is as convenient as it gets for a human. With a little care and 
patience, it is possible to write down such a polynomial without even the aid of a calculator. However, adding 
points to the interpolation and evaluating the polynomial for non-interpolated points can be cumbersome tasks. 
Consider a simple example: the polynomial interpolating f(x) = e x at x = 0, 1,2: 


L 2 {x) 


(x~ l)(z — 2) 0 (s-0)(s-2) ! (x-0)(x- 1 ) 2 

(0 — 1)(0 — 2 ) ' (1 0 ) (1 2 ) + ( 2 - 0 )( 2 -l) 

(x — l)(x — 2) x(x — 2) x(x — 1) o 

1 f • 


Evaluating L 2 (l-5), for example, requires either 

1. computing the values of the three separate terms, each a quadratic polynomial, and adding: 

£2(1.5) = (^-lK1.5-2) + 1.5(l ; 5-2) e+ 1 .5( 1 .5- 1 ) e2 

= -.125+ ,75e + ,375e 2 
« 4.684607408443278 


or 


2. the unpleasant business of simplifying L2 into a simpler form and then evaluating: 


L 2 (x) = 


(x — l)(x — 2) x(x — 2) x(x — 1) 2 

1 e 2 

-( x 2 — 3x + 2) — e(x 2 — 2x) + ^(+ 2 ~~ x ) 

3 
’2 


2 6+ 2 X 


2e — — ) x + 1 


1.47624622100628a; 2 + 0.242035607452765a: + 1 


so L 2 (l-5) « 1.47624622100628(1. 5) 2 + 0.242035607452765(1.5) + 1 = 4.684607408443277. 
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Method 2 is better if you have more points at which to evaluate, and method 1 is better if you plan to add points 
of interpolation. However, neither method is particularly convenient. Even less convenient than evaluating the 
polynomial is the task of requiring another point of interpolation. Previous work is of limited use. And we haven’t 
even begun to discuss the trouble of writing a computer program to automate the calculations. Neville’s method 
can be used to overcome these limitations when the value of the polynomial at a specific point is required. 

Neville’s method is based on the observation that interpolating polynomials can be constructed recursively. 
Suppose Pfc.z is the polynomial of degree at most l interpolating the data 

(Xk, f(Xk)), (x k +l,f(x k +l)), ■ ■ • , (Xk+l, f(x k +i)). 


Then, by definition, Po iTl is the polynomial of degree at most n interpolating the data 

(xo, f(x o)), (xi, f(xi )), . . . , (x n , f(x„)). 
Moreover, P 0 n can be computed using the recursive formula 

n , (x - Xi +rn+ i)Pi, m (x) - (x - Xi)Pi + 1 ,m(x) 

Pi,m+ l{x) = — 

1 

Pi,o(x) = f(xi), i = 0,...,n. 

This claim can be checked by noting five things: 

1. Pi t o is the degree 0 polynomial interpolating the one datum (a :,,/(xi)). 


(3.2.4) 


2. Pi^ m and Pj+ i. m are polynomials of degree at most to, so Pi tTn + i is a polynomial of degree at most to + 1. 


3 P .Hx-)-— Xi + m +^ Pi ,m{xj) _ (x-)-f(x-) 


4. For any j = i + 1, . . . , i + m, 


y-) / \ faj (p^j 

*■, \%j ) — 

{pEj %i+m-\-l) f (pEj ) {pEj %i) f {pEj) 

f{p£j) \(p^j 1) (p^j ^i)] 

%i-\-m -\- 1 

= f(Xj)- 


r n \ _ {Xi+m+1 Xj)Pj+i im (Xi+ m +i) _ , \ \ 

O- v^2+m+l J — — i+l,m v^-i+ra+l J — /v^i+m+lF 

1 

A rigorous proof by induction on m, requested in the exercises, should follow closely these notes. Points 1 
and 2 establish that P fc j has degree at most l. Points 3 through 5 establish that P fcj ; interpolates the points 
(xfc, f(xk)), {xk+ i, /(xfc+i)), . . . , (xk+i, f(xk+i))- Formula 3.2.4 succinctly summarizes Neville’s method. 

While Neville’s method (formula 3.2.4) can be used to find formulas for interpolating polynomials as in 


Po,i(x) = 


( X - Xi)Po.o(x) ~(x- X 0 )Pl,o(x) 


X — Xi 
XQ- Xi 


f(x o) 


X 0 - Xi 

X — Xq 

Xi-Xo~ 


f(x l), 


it is normally used to find the value of an interpolating polynomial at a specific point. We earlier determined that 
L 2 (1.5) = 4.684607408443277 for the polynomial, L 2 (x), interpolating f(x) = e x at x = 0,1,2. We now find this 
value using Neville’s method. P 0j o(l-5) = /( 0) = 1, Pi i q( 1.5) = /( 1) ~ 2.718281828459045, and P2,o(l-5) = /( 2) ss 
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Table 3.2: Neville’s method example, calculating Po, 2(1-5). 

X% Pi . 0 — f(Xi) Pi, 1 Pi, 2 

“0 I 3.577422742688568 4.684607408443278 

1 2.718281828459045 5.053668963694848 

2 7.38905609893065 


7.38905609893065. So 

(1.5 - ai)P 0 ,o(1.5) - (1.5 - ao)Pi,o(1.5) 

x 0 - X 1 

_ (1.5 - 1)(1) - (1.5 - 0)(2. 718281828459045) 

0-1 

« 3.577422742688568 
(1.5 - a; 2 )Pi,o(1.5) - (1.5 - a?i)P 2 , 0 (1.5) 

Xi - x 2 

(1.5 - 2) (2.718281828459045) - (1.5 - 1)(7.38905609893065) 

1-2 

5.053668963694848 

(1.5 - z 2 )Po, 1 (1.5) - (1.5 - g 0 )Pi,i(1.5) 
x 0 - x 2 

(1.5 - 2) (3.577422742688568) - (1.5 - 0) (5.053668963694848) 

0-2 

4.684607408443278. 

A tabulation of the computation may make it easier to internalize the recursion and imagine how this process might 
be automated. Table 3.2 shows such a tabulation. The use of this recursive formula may be more difficult than 
direct computation for a human being, but for a computer, using the recursion is much quicker and simpler as 
evidenced by a look at the pseudo-code. 

Assumptions: P n (x) is the degree at most n polynomial interpolating the data 

(zo, f(x o)), (xi, f{xi)), (x n , f(x n )) 


Po,i(l-5) = 


Pi,i(l-5) = 


Po, 2(1-5) = 


and the value P n {x) is desired. 

Input: Value x; abscissas xo, X \, ..., x n ] ordinates f(xo),f(xi),...,f(x n ). 
Step 1: For i = 0 . . . n do Step 2: 

Step 2: Set P ifi = f(xi ); 

Step 3: For j = 1 . . . n do Steps 4-5: 

Step 4: For i = 0 . . . n — j do Step 5: 

Step 5: Set P t ,• = G-^+d^-i-^m+ia-i 

Output: Table of values, P. Po,n holds the desired value, L n (x). 


Uniqueness 

There are some subtleties we have thus far glossed over. When we introduced the Lagrange form, we casually stated 
“ L n is called the Lagrange form of P n ”, implying that the Lagrange form gives the interpolating polynomial of least 
degree (since P n is defined as such)! This fact is far from obvious. Nonetheless, we went on as if it were obvious that 
L n and P n were one and the same polynomial. Worse yet, when we came around to discussing Neville’s method, 
we calculated Po, 2 (1.5) and compared it to L 2 (1.5) from earlier with the implication that they should be the same, 
again as if it were simply given that Pq 2 and L 2 should be the same polynomial. The following result shows that 
our blind faith that P n , L n , and Pq^ amount to different names for the same object was not misplaced (by virtue 
of the fact that they all interpolate the same data and have degree at most n) . 
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Theorem 7. The polynomial, P n , of least degree interpolating the data (xo, yo)j (xi, yi), ...,(x nt y n ) exists and is 
unique. Moreover, any interpolating polynomial of degree at most n is equal to P n . 

Proof. By construction, L n interpolates the data. Moreover, the degree of L n is at most n since it is the sum of 
polynomials Pi each with degree exactly n. Thus P n exists and has degree at most n [at this point, we must 
admit that the degree of P n may be less than that of L n ], Now suppose q is any polynomial interpolating 
(xo, yo), {x\, y ±), . . . , (x n , y n ) with degree n or less. Then the polynomial / = P n — q also has degree n or less. 
Moreover, f(xi) = P n (%i) — q{xf) = yi — yi = 0 for all i — 0, . . . , n. Thus / has n + 1 roots. Alas, the only way / 
can have n + 1 roots and have degree n or less is if / is identically 0. Hence, f(x) = P n (x) — q{x) = 0, implying 
P n ( x) = q(x ) for all x. □ 


Octave 


The indices presented in the pseudo-code are predicated on indexing starting with 0, as in the mathematical 
description. In Octave, however, indices can not be 0. They are always positive integers. A slight modification of 
the indices is required to accommodate this discrepancy. 


mmnmmmmmmnmmmmmmmmmmmm 


/ Written by Leon Brin 22 March 2014 7, 


% Purpose: This function implements Neville’s method for 7« 
°i computing the value P(xhat) of the interpolating 7, 
7. polynomial P passing through the data (x(l),y(l)), 7« 
7. (x(2),y(2)),...,(x(n),y(n)). 7. 

! INPUT: value xhat ; array x of abscissas; array y of °L 
7. ordinates. 7o 
/ OUTPUT: table of values Q; Q ( 1 ,n)=P(xhat) . 7, 


mmnmmmmmmnmmmmmmmmmmmm 


function Q = nevilles(xhat ,x,y) 
n=length(x) ; 
for i=l : n 

Q(i,l)=y(i) ; 

end7«f or 
for j=2:n 

for i=l:n+l-j 

Q(i, j)=((xhat-x(i+j-l))*Q(i, j-1)- (xhat-x(i) )*Q (i+1 , j-l))/(x(i)-x(i+j-l)) ; 
end7«f or 
end7«f or 
end7«f unction 


nevilles.m may be downloaded at the companion website. 


Key Concepts 

Interpolating function: A function whose graph is required to pass through a set of prescribed points. 
Interpolating polynomial: A polynomial whose graph is required to pass through a set of prescribed points. 


Interpolating polynomial of least degree: The polynomial of least degree interpolating a given set of n + 1 
data points is unique. We denote this polynomial by P n . 

Interpolating polynomial of degree at most n: The polynomial interpolating n + 1 distinct points has degree 
at most n and is equal to the polynomial of least degree interpolating the points. 

Generalized Rolle’s theorem: Suppose that / has n derivatives on (a, b) and /, f , f ", . . . , are all contin- 

uous on [xo, x n }. If f(x o) = /(xi) = • ■ ■ = f(x n ) for some xq < X\ < • • • < x n , then there exists £ € (a, b ) 
such that f^if) = 0. 


Lagrange form of an interpolating polynomial: The Lagrange form, L n , of the polynomial of degree at most 
n interpolating the points (xq, yo)i (aq, t/i (x n , y n ) is given by the formula 


L n (x) — 'y ] 

t= o 


Pi{x) 

( \y^'> 

Pi(Xi) 
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where p { (x) = JJ(a: - Xj) = (x - x 0 ) ■ ■ ■ (x - x - x i+1 ) ■ ■ ■ (x - x n ). 

3 = 0 

j¥=i 

Interpolation error: For P n , the interpolating polynomial of least degree passing through the n + 1 points 
(x 0 , yo), (xi, yi), . . . , (x n , y n ), there is a value £ x G (a, b) such that 


fix) - P n (x) 


/ (n+1) (&), 

— — r^-(a; - X 0 ){x X n ), 

(' n + 1 )! 


assuming / has n+ 1 derivatives on (a, b) and are all continuous on [a, b], and where 

a = min(xo, . . . , x n , x) and b = max(a:o, . . . , x n , x). 

Sidi’s method: A root-finding method summarized by the formula 


Xn - (-1 — X n 


f{Xn) 
Pn,k( X n ) 


where p n ^ is the interpolating polynomial passing through the points 


(Xni f(Xn)), (x n — f (x n —l)) , * * * , (Xn—ki f i^X n —k)) • 


Neville’s method: A method for computing the interpolating polynomial of least degree or values of it based on 
the recursive relation 


Pi,m + 1 (x) 
Pi, o(x) 


(x - X i+m+1 )Pj,m(x) - (x - Xj)P i+ l } m(x) 

*^z+m+ 1 

f(Xi) 


where P^ } i is the polynomial of least degree interpolating the data 

(x k , /( x k )), (x k+ 1 ,f(x k +i)), • - - , (x k+ i, f{x k+ i)). 


Exercises 

1. Write down the Lagrange interpolating polynomial 
passing through (1,2), (1.5,— 0.83), and (2.11,— 1). 

2. Find a polynomial that passes through the four points 

(0,0), (1,2), (4,-3), and (10,-1). 

3. Construct the (at most) quadratic Lagrange Polyno- 
mial interpolating the data. 

(a) (1,1), (2,1), and (3,2) 

(b) (0,10), (30,58), (1029,-32) 

(c) (-10,10), (20,58), (1019,-32) [s] 

(d) 


X 

fix) 

5 

15 

200 

2 

10 

15 


(e) 


X 

fix) 

-5 

15 

-2 

2 

3 

15 


4. Suppose the data from question 3 were taken from an 
appropriately differentiable function /. Use the inter- 
polating polynomial you found in question 3 to estimate 
/(1.3). 181 

5. Find the estimate in question 4 using Neville’s method. 

[S] 

6. Given the following data for /(*), approximate /(0.3) 
using an interpolating polynomial of degree at most 

(a) 1 

(b) 2 

(c) 3 


X 

0 1 2 

3 

fix) 

0.8 0.7 0.75 

0.5 


7. Given the following data for f(x), approximate /( 3) 
using an interpolating polynomial of degree at most ^ 

(a) 1 

(b) 2 

(c) 3 


X 

2 3.5 

4 

5 

fix) 

O 

oo 

p 

0.75 

0.5 


8. * ■> Use interpolating polynomials of degrees one, two, 
and three to approximate each of the following: 
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(a) /(0.43) if /( 0) = 1, /(0.25) = 1.64872, /(0.5) = 
2.71828, /(0.75) =4.48169. 

(b) /(0.18) if /( 0.1) = -0.29004986, /(0.2) = 
-0.56079734, /(0.3) = -0.81401972, /(0.4) = 
-1.0526302. [s] 

(c) /( 2.26) if /(l) = 1.654, /( 1.5) = -2.569, /( 2) = 
-1.329, /( 2.5) = 1.776. [s] 

(d) /(11.26) if /( 10) = -0.7865, /( 11) = -1.2352, 
/(12) = -0.8765, /(13) = 0.0021. 

9. Let xo = 1, X\ = 1.25, and X 2 = 1-6. Using data 
at these Xi, construct interpolating polynomials of de- 
grees at most one and at most two and use them to 
approximate /(1.4). Find the absolute errors. 

(a) f(x)=simrx^ 

(b) fix) = \Jx - 1 

(c) f{x) = e 2 *- 4 

(d) f{x) = ln(10a;) 

10. Use formula 3.2.3 to find theoretical error bounds for 
the approximations in question 9. Compare the bound 
to the actual error. ' 


0 1 2.6 P 0 ,2 3.016 

0.25 2 Pi,i 2.96 

0.5 P 2 , o 2.4 

0.75 8 

Determine /(0.5). 

16. L^ix) = —7a; 3 + 57a -2 — 134a: + 78 is the degree (at 
most) 3 interpolating polynomial for the data in the 
table. Find u. [A1 


X 

0.5 

O 

OO 

£ 

1.4 

y 

24.375 

3.696 0 

-17.088 


17. Let Psix) be the interpolating polynomial for the data 
(0,0), (0.5, y), (1,3), (2,2). Find y if the coefficient of 
a: 3 in Psix) is 6. 

18. Let f{x) = \Jx — x 2 and -Pa (a:) be the interpolating 
polynomial on xo = 0, xi, and X 2 = 1. Find the largest 
value of X\ in (0, 1) for which /( 0.5) — p2(0.5) = —0.25. 

19. The interpolating polynomial on n + 1 points does not 
always have degree n. It has degree at most n. Plot the 
data (1, 1), (2, 3), (3, 5), and (4, 7), and make a conjec- 
ture as to the degree of the polynomial interpolating 
these four points. What led you to your conjecture? 


11. A Lagrange interpolating polynomial is constructed for 
the function fix) = i\/2) x using xo = 0, Xi = 1, 
X 2 = 2, X 3 = 3. It is used to approximate /(1.5). 
Find a bound on the error in this approximation. 

12. Find the polynomial referred to in question 11. Then 

(a) use the polynomial to approximate /(1.5); and 

(b) calculate the actual error of this approximation, 
and compare it to the bound you calculated in 
question 11. 

13. * ■ Use Neville’s method to find the approximation in 
question 11. 

14. The height of a model rocket is given at several times 
in the following table. Approximate the height of the 
rocket at time t = 0.6 sec using at least two different 
sets of points. Comment on which approximation is 
likely most accurate. 


Time (sec) 

Height (ft) 

0.53238 

30.0534 

0.56040 

32.7929 

0.58842 

35.4956 

0.61644 

38.1575 


15. The following table results from using Neville’s method 
to approximate /(0.4). 


20. Use Neville’s method to find the polynomial described 
in question 19. Does it have the degree you expected? 

21. Let 


Xj 

fix) 

Pnix) 


Find 


1 r-j— r for j =0,1, 2,... 

J + 1 

5 + 3* 2018 


the interpolating polynomial 

passing through 

ix 0 , fix 0 )), . . . ,ix„, fix n )). 


lim P n ( 1). 


22. Let fix) = e~ x . Two different numbers are chosen at 
random from the interval [0, 1], say Xo and X\. Then 
the points (a;o,/(a;o)) and {xi,f{xi)) are used to get a 
linear Lagrange interpolation approximation to / over 
the interval [0, 1]. Find a bound (good for the entire in- 
terval and every pair of points xo and xi) for the error 
in using this approximation. 

23. Supply the inductive proof that Po, n is the poly- 
nomial of degree at most n interpolating the data 
(xo,f(xo)),(xi,f(xi)),...,(xn,f(x n ))- See notes on 
page 112. 
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3.3 Newton Polynomials 

In this section, we are interested in an efficient automated process for calculating interpolating polynomials. The 
Lagrange form of an interpolating polynomial is best suited for pencil and paper calculations, not computer au- 
tomation. Neville’s method is well suited for computing the value of an interpolating polynomial at a particular 
point, not calculation of the polynomial itself. True, Neville’s method can be used to calculate the interpolating 
polynomials themselves, but it lends itself to this task no better than the Lagrange form. Presently, we will discover 
how the same recursive formula used in Neville’s method is used to derive a very efficient, computer-friendly method 
for calculating interpolating polynomials themselves. The result of the computation is a set of coefficients for the 
Newton form of a polynomial. 

Suppose we have already computed the polynomial N n (x) interpolating the data 

(x’o, / Oo)), (x 1 J(x 1 )), . . . , (x„, /(x„)). 

We now wish to compute the polynomial N n+1 {x ) interpolating the data 

(xo, f(x 0 )), . . . , (x n+1 , f(x n+1 )), 

and we would like to recycle the work we have already done (much the same way we could add a point of interpolation 
in Neville’s method and reuse all previous work)! One way to attack the problem is to find a polynomial q(x) such 
that 

N n + i(x) = N n ( x) + q(x). 

If the attack is to be successful, we must have q(x) = N n+ i(x) — N n (x) for all x, and, in particular, q(xj) = 
N n +i(xj) - N n (xj) for j = 0, 1, . . . , n + 1. But N n+ 1 (xj) - N n (xj) = f(xj) - f(xj) = 0 for j = 0, 1, . . . , n, and 
A^ n+ i(a: rl+ i) — N n {x n +\) = f(x n + 1 ) — N n (x n + 1 ). In other words, we seek the polynomial q interpolating the points 


(xq, 0), (xi,0), . . . , (x n , 0) , (x n _)_i , (/ N n ) (x n+ i)). 


Ironically, this is a job for the Lagrange form: 


q(x) 


(x - x 0 ) • • • (x - x n ) 

(Xn+l Xq) ' ' * X n ) 

(/ ~ N n )(x n+1 ) 

(x?i+l Xq) ' ' * (x n _j-i X n ) 


(/ - N n )(x n+ 1 ) 

(x - Xq) • • • (x - X„). 


(3.3.1) 


But ) is jxist a constant, so we replace it by a n + 1 so that we have q{x) = a„+i(x — Xo) • • • (x — x n ). 

Of course we can calculate a n+ i using the formula ^ th ere is a better way, which we will 

see shortly. We can also learn from the upcoming computation the most convenient form for N n . 

When n = 0, q has the form ui(x — Xo); when n = 1, q has the form a 2 (x — Xq)(x — Xi); when n = 2 , q has 
the form a3(x — Xo)(x — Xi)(x — X2); and so on. Of course Nq(x) = ao is constant since it is the interpolating 
polynomial of least degree passing through a single point. So Ni(x) = N 0 (x) + ai(x — Xo) immediately takes the 
form ao + a\(x — Xo); A^(x) immediately takes the form ao + ai(x — Xo) + a 2(x — Xo)(x — Xi); No(x) immediately 
takes the form ao + a i(x — xq) + a2(x — Xo)(x — Xi) + ao(x — Xo)(x — Xi)(x — X2); and so on. This would suggest 
that the most convenient form for the one that requires no simplification, is 


N n + i(x) = a 0 + ai(x - a: 0 ) + a 2 (x - x 0 )(x - xi) H 1- a„+i(x - x 0 ) ■ ■ ■ (x - x n ). (3.3.2) 

Given in this form, the unknown quantity, a n+ 1 , appears as the coefficient of the x n+1 term. Consequently, a n+ 1 is 
potentially the leading coefficient of N n+ i. If a n+ 1 were zero, then we would not call it the leading coefficient. We 
will facilitate the rest of this discussion by introducing the following term. For an interpolating polynomial on k+ 1 
points, the coefficient of its x k term is called its potential leading coefficient (even if it happens to be zero). 
Since this potential leading coefficient is the crux of our problem, we focus attention on determining the potential 
leading coefficient of any interpolating polynomial. 

Here is where the recursive formula 

n / \ (x X.j-|_ m _|_i).F) m(x) (x Xi)T);_|_i m (x) 

-M,m+l(x' - 


Pi, o(x) = f{Xi) 
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used in devising Neville’s method comes in handy. In as much as P i)TO and Pi+l,m both have degree at most m , 
their potential leading coefficients are the coefficients of their x m terms. It follows that the coefficient of the x m+1 
term of (x — Xi+ m +i) Pi tTn (x) equals the potential leading coefficient of P,:, m (:r), and, similarly, the coefficient of 
the x m+1 term of (x — Xi)Pi + i. jn equals the potential leading coefficient of P i+ i >m . Therefore, the coefficient of the 
x m+1 term of (x — :c,; +m+ i)P; im (:r) — (x — Xi)Pi+i jrn (x) is the difference of the potential leading coefficients of Py m 
and Pi+\ t m- To simplify the discussion, we use the notation fi j for the potential leading coefficient of Pi j. Now the 
coefficient of the x m+1 term of (x — Xi +m+ i)P itm (x ) — (x — Xi)Pi + i t m(x) is just /i jm — Hence, the potential 

leading coefficient fi, m +i of Py m +i (the coefficient of the x m+1 term of Py m+ i) is given by 

r fi,m ~ fi+l,m /„ „ 

Ji,m + 1 — — (.O.b.bJ 

Xi X i 

fi, 0 = 


Crumpet 22: DividedDifferences 


While we choose to use the notation fij for the potential leading coefficient of Pi,j, it is much more customary 
to use the expanded notation f[xi,Xi+ 1 , . . . , £,+ ,■] for this quantity, and to call it a j th divided difference. 


Finally, we have a formula for the potential leading coefficient that recycles previous calculations. Since N n+ 1 
and Po,n+i interpolate the same set of points and both have degree at most n + 1, they are equal by theorem 
7. Therefore, their potential leading coefficients, a n+ i and /o, n +i are equal. By recursion 3.3.3, we then have 

„ _ f _ fa.n—fl.n 

a-n+1 — J 0,n+l — Xo - Xrl+1 ■ 

It can not be stressed enough that we have not discovered a new polynomial. We have only discovered a new 
way to calculate the same old interpolating polynomials. N n , L n , and Po,n all interpolate the same data and all 
have degree at most n. They are, therefore, equal by theorem 7. Just the forms in which they are written possibly 
differ. The polynomial form in equation 3.3.2 is called the Newton form. 


Crumpet 23: Newton Polynomials 


Typically, the Newton form and divided differences are presented completely independent of Neville’s recursive 
formula, an approach that takes considerably more work to develop. There are reasons to do so, however. Refrain- 
ing from the use of Neville’s formula follows more closely the historical development of the subject since Newton 
(1643-1727) preceded Neville (1889-1961) by over 200 years! Moreover, following the historical development more 
naturally leads to further study of divided differences. 


As an example, take the polynomial interpolating f(x) = e x at x = 0, 1, 2, as we did in the discussion of Neville’s 
method on page 111. / 0 , 0 = /( 0) = 1, /i, 0 = /( 1) ~ 2.718281828459045, and / 2 , 0 = /( 2) « 7.38905609893065. So 

/o,o - /i,o _ 1 - 2.718281828459045 
0,1 Xq — X\ 0 — 1 

ft! 1.718281828459045 

/i.o - / 2 ,o _ 2.718281828459045 - 7.38905609893065 
1,1 XI — X-2 1 — 2 


~ 4.670774270471606 

/o,i - /i,i _ 1.718281828459045 - 4.670774270471606 
Xq — X2 0 — 2 

« 1.47624622100628. 
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Table 3.3: Newton form example, calculating N 2 (x). 

Xi fi, 0 = f{Xi) fi, 1 fi, 2 

"0 T 1.718281828459045 1.47624622100628 

1 2.718281828459045 4.670774270471606 

2 7.38905609893065 


Therefore, N 2 (x) = l + 1.718281828459045(x) + 1.47624622100628(x)(a: — 1). fo,i are the coefficients of N n . Though 
this computation is manageable without a table, it is most convenient to tabulate the values of fij as they are 
computed (just as is the case for Neville’s method). This is true for both humans and computers! A tabulation 
of the computation makes it easier to internalize the recursion and imagine how this process might be automated. 
Table 3.3, which is called a table of divided differences (or divided difference table), shows such a tabulation. Adding 
a data point to the interpolation is as easy as computing another diagonal of coefficients (just like Neville’s method). 


Sidi’s Method 


We now return attention to Sidi’s k th degree root-finding method, 

g(x n ) 


X n -\-l — %r 


>n)’ 


where p n is the interpolating polynomial passing through the points 

(Ai l/(*£n)) 5 (Xn— 1 > 9(x n — i)) , . • • , (x n _fc, ^(x n _/ c )) . 

In its Newton form, 

Pn,k( X ) = 9n, 0 + 9n-l,l(x - x n ) + g n - 2 , 2 {x - x n ){x - X„_i) H h g n -k,k{ x -x n )---(x- X n _ fe ), 

so 

Vn,k{Xn) = 9n- 1,1 + 9n-2,2{ X n ~ X n -l) H h 9n-k,k(x n - X n -i) ■ ■ ■ (x n - X n - k ). (3.3.4) 

In particular, 

Pn.2 (Xn ) 9n— 1,1 T (x n X n —\)g n — 2,2 

and 

Pn,3^Xn) = 9n— 1,1 T (x n X n —i)g n —2 7 2 T (Xn X n — l)(x n X n —2)gn— 3,3 
and so on. As a nested product, 


p'n,k{ x n) = 9n- 1,1 + (x n ~ X„-l) [g n - 2,2 + (z n - Xn- 2 ) [ k (x n - X n _fc) [g„-fe,fe] ■■■]]. 

The nested form is particularly efficient for implementation. 

Assumptions: g is k times differentiable. 

Input: Initial values xq,x±, . . . , x k \ diagonal entries g k ,o, g k -i,i, ■ ■ ■ ,go,k of the divided difference table for 

9- 

Step 1: Set s = go, k ', 

Step 2: For i = 1, 2, . . . , k — 1 do Step 3: 

Step 3: Set a = (x k - x^s + g i} k-i] 

Step 4: Set x k +\ = x k — 

s 

Output: Approximation x k+ \ . 

While this pseudo-code is good as far as it goes, it is far from complete. The most obvious deficiency is that it only 
executes one step of Sidi’s method. A less obvious deficiency is that its input and output do not match in type or 
quantity, so at the end of the routine, the computer is still not ready to compute another iteration. What we get 
from this routine is x k +i- What we need to run it again are the two arrays Xq, x\, . . . , x k and g k , o, fffc-i.i, • ■ ■ , go,k- 
In order to prepare these arrays for the next iteration, we must re-index the values of Xi and then compute new 
values for the gi t k-i ■ 
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Assumptions: g is k times differentiable. 

Input: Initial values Xq,X\, . . . ,x k \ diagonal entries gk,o,9k-\,\, ■ ■ ■ ,9o,k of the divided difference table for 
9- 

Step 1: Set x k +i according to Sidi’s method applied to xq, x\, . . . , x k and g k ^, 9k- 1 , 1 , • ■ • , go.fc! 

Step 2: Set gk+i,o = g(x k+ i); 

Step 3: For i = k,k — 1, . . . , 1 do Step 4: 


Step 4: Set gu- i i 


9i+l,k—i 9i,k—i' 

Xk+l X'i 


Output: Approximations X \, . . . , x k +i and corresponding diagonal entries g k+ pee 9k, l, • • • , gi,k of the divided 
difference table for g. 


This new pseudo-code, which utilizes the previous pseudo-code in its first step is an improvement. Now the input 
and output match in type and quantity, meaning the output of this routine may be used as input for the next 
iteration. However, this routine still only calculates one step of Sidi’s method. Moreover, we have been ignoring 
another issue. Each of the routines spelled out in pseudo-code so far assume we have the diagonal entries of the 
corresponding divided difference table. It is not good practice to make the user of the code worry about this detail. 
The routine we write should supply these values. After all, the end-user, the person trying to find a root of a 
function, will only have immediate access to the function and some number of initial values. The routine must 
supply the rest. Finally, we present pseudo-code in the spirit of other root-finding methods. 


Assumptions: g has a root at X] g is k times differentiable; x^,x\, . . . ,x k are sufficiently close to x. 
Input: Initial values Xq,Xi, . . . ,x k ; function g ; desired accuracy tol: maximum number of iterations N. 
Step 1: For i = 0, 1, . . . , k do Step 2: 

Step 2: Set g it0 = g{xi); 

Step 3: For j = 1, 2, . . . , k do Steps 4-5: 

Step 4: For i = 0, 1, . . . , k — j do Step 5: 

Step 5: Set g t 7 - = 

X j X{ 

Step 6: For i = 1 ... IV do Steps 7-11: 

Step 7: Compute x = x k +± according to Sidi’s method applied to 
x 0 1 X \ , . . . , x k and g k $ } g k — i ; i, • . ■ , go, k i 
Step 8: If \x — x k \ < tol then return x; 

Step 9: Compute g k + i,o, 9k, l, • • • , 9i,k\ 

Step 10: Set Xo = X\ = Xi\ ■ • • x k _\ = x k \ x k = x; 

Step 11: Set g k ,o = g k + i,oi 9k-i,l = 9k, l', • ■ • 9 o,k = 9i,k', 

Step 12: Print “Method failed. Maximum iterations exceeded.” 

Output: Approximation x near exact fixed point, or message of failure. 


As complete as this latest pseudo-code is, it leaves one item unaddressed. It requires k initial values to run Sidi’s k th 
degree method. When we encountered the secant method, we noted that needing two initial values as opposed to 
one was a disadvantage. The disadvantage is only magnified in Sidi’s method where k+ 1 initial values are required. 
However, just as with the secant method, we can automatically generate initial values if needed. If Sidi’s method is 
given one initial value, Xo, and we are trying to find a root of the function g, then we can set X\ = Xq + g(x o) just 
as we did for the secant method. You may recall, this was not particularly successful, however. The secant method 
often failed to converge with this selection of initial condition. 

Much less is known about Sidi’s method and how the selection of intial values affects convergence. It might 
make an interesting project to analyze good and bad practices for selecting initial values. In any case, if you have 
initial values Xo, x\, . . . , Xj with 1 < j < k, the remaining k + 1 — j intial values can be found using Sidi’s method 
of degree j (on xq, X\, . . , 3 Xj) to get Xj + 1 followed by using Sidi’s method of degree j + 1 (on Xq, aq, . . . , Xj + 1 ) to 
get Xj + 2 followed by using Sidi’s method of degree j + 2 (on a;o,a;i, . . . ,2^+2) to get aq+3, and so on until x k is 
computed. 
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Octave 

As is the case with Neville’s method, the Octave code follows identically its corresponding pseudo-code except that 
indices have been modified to accommodate indexing beginning with 1, not 0. 


mnnmmmmmmmmmmmmmmm 

7, Written by Dr. Len Brin 1 April 2014 7, 

7. Purpose: Implementation of Sidi’s Method 


7o INPUT: function g; initial values x0,xl, . . . ,xk; 

/ tolerance TOL; maximum number of 

/ iterations N 

7t OUTPUT: approximation X and number of iterations 
/ i; or message of failure 


7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 

/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o 


function [X,j] = sidi(x, TOL, N, g) 
n=length(x) ; 
for i=l:n 

G(i, l)=g(x(i) ) ; 
end7«for 
for j=2:n 

for i=l:n+l-j 

G(i, j)=(G(i+l , j-l)-G(i, j-l))/(x(i+j-l)-x(i)) ; 
end7«f or 
end7«for 
for i=l:N 
s=G(l ,n) ; 
for j=2:n-l 

s=(x(n)-x(j ) )*s+G(j ,n+l-j ) ; 
end7«f or 

X=x(n)-G(n, l)/s; 
if (abs(X-x(n))<T0L) 


return 

end7«if 

G(n+1 , l)=g(X) ; 
for j=n: -1 : 2 

G(j , n+2-j )=(G(j+l,n+l-j)-G(j ,n+l-j))/(X-x(j)) ; 
end7«f or 
for j=l:n-l 
x(j)=x(j + l) ; 
end7«f or 
x(n)=X; 
for j=l:n 

G(n+l-j , j)=G(n+2-j , j) ; 
end7«f or 
end7«for 

X = "Method failed. Maximum iterations exceeded."; 
end7«f unction 


sidi .m may be downloaded at the companion website. 


More divided differences 

Divided difference tables are generally computed for the sake of finding coefficients for one interpolating polynomial, 
and one interpolating polynomial only. However, each table of divided differences is rife with representations of 
interpolating polynomials. One of the strengths of a divided difference table is that its entries may be reused should 
more data be added. This same property can be thought of in reverse. Suppose you have a divided difference table 
computed over 4 data values but you are only interested in an at-most-degree-2 interpolating polynomial. The 
divided difference table 
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Xo 

jo,o 

/o,i 

So , 2 fo, 3 

Xl 

jl,0 

/l,l 

Si , 2 

x 2 

/2,0 

/2,1 


X 3 

/3,0 




actually gives us two different at-most-quadratic interpolating polynomials with four representations for each! First, 
the table was devised to compute the interpolating polynomial 


P 3 (x) = /o,o + fo,i{x - x 0 ) + f 0 , 2 {x - .t 0 )(x - aq) + f 0 ,3{x - x 0 ){x - xi)(x - x 2 ). 

Notice that if we simply truncate the fo^(x — Xo)(x — x{){x — x 2 ) term, we still have an interpolating polynomial 
with nodes x 3 , aq, aq. We can support this claim in at least two ways. First, the term fo t3 (x — x 3 )(x — x{){x — x 2 ) 
is 0 at Xq,X\,x 2 so it does not contribute to the interpolation at the nodes xq,Xi,x 2 . Second, we can “reverse 
engineer” the table, simply erasing the bottom-most diagonal. The remaining table is still a legitimate divided 
difference table since none of the remaining entries depends on any of the erased entries: 


Xo 

Xl 

x 2 


/o,o /o.i /o, 2 
jl,0 jl.l 

f'2,0 


So 


P 2 (x) = jo , 0 + So,l(x - X 0 ) + jo, 2 (x - X 0 ){x - Xl) 

is one of the degree at most 2 interpolating polynomials. Erasing the top row of the table also leaves a legitimate 
divided difference table: 


Xl 

x 2 

x 3 


jl,0 jj,l Si, 2 
j*2,0 /2.1 

j*3,0 


SO 

Q 2 (x) = f 1,0 + flA x ~ Xl) + fl, 2 {x - Xi)(x - X 2 ) 

is another degree at most 2 interpolating polynomial. Notice that P 2 and Q 2 are not just different representations 
of the same polynomial. They are two different polynomials! P 2 interpolates over the nodes x 3 ,xi,x 2 while Q 2 
interpolates over the nodes xi,x 2 ,x 3 . 

The bottom diagonals of each truncated table give degree at most 2 interpolating polynomials as well. Remember, 
fi j represents the potential leading coefficient of the interpolating polynomial over the nodes Xi, aq_|_i, . . . , aq+j. 
Hence, 

Q 2 {x) = j* 3 ,o + f 2 ,i{x - x 3 ) + jj, 2 (x - x 3 )(x - x 2 ) 
interpolates over the nodes x 3 ,x 2 ,xi and 

MX) = j*2,0 + fl,l{x - X 2 ) + / 0 ,2(X - X 2 )(x - Xi) 

interpolates over the nodes x 2 ,Xi,xq. These are not new polynomials. These are new representations for P 2 and 
Q 2 . Actually, P 2 = P 2 and Q 2 = Q 2 - 

The critical feature of each of these interpolating polynomial representations is that each successive coefficient 
depends on all the same nodes as its predecessor, plus one new one. For example, f 2i o depends on x 2 , fi.i depends 
on x 2 and Xi, and /o ,2 depends on x 2 , Xi, and x 3 - Hence, these three coefficients can be used to produce the 
interpolating polynomial over the nodes Xq,Xi,x 2 in the form of polynomial P 2 (which, as we have already noted, 
equals P 2 ). Another representation for the same polynomial can be written by utilizing jj^o (which depends on aq), 
jo,i (which depends on aq and x 3 ), and f 3i2 (which depends on xi,Xq,x 2 ): 

P 2 {x) = jj.o + fo,i( x - Xi) + f oa (x - aq)(a: - a: 0 ) 

to give a representation of the polynomial interpolating over a;o,aq,a ;2 (which, therefore, must equal P 2 ). There is 
one more representation of P 2 that can be extracted from the original divided difference table. It comes from the 
coefficients jj,o> /i,i, jo, 2 - Can you write it down? Answer on page 126. There are two more representations of Q 2 
that can be extracted from the original divided difference table. Can you write them down? Answers on page 126. 
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Key Concepts 

Newton form of an interpolating polynomial: The Newton form, N n , of the polynomial of degree at most n interpo- 
lating the points (xo,yo),(xi,yi), . . . ,(x n ,yn) is 

N n (x) = a 0 + ai(x — Xi 0 ) + a 2 (x — Xi 0 )(x — x i t ) H h a n (x — Xi 0 ) ■ ■ ■ (x — Xi n _ x ) 

for n distinct indices io, ii, ■ ■ ■ , in- 1 from the set {0, 1,2,..., n}. The Newton form for a particular set of data is not 
unique. 

Potential leading coefficient: For an interpolating polynomial on k + 1 points, the coefficient of its x k term is called its 
potential leading coefficient. 

Divided differences: The coefficients of the Newton form of an interpolating polynomial are called divided differences. 


Exercises 

1. Modify the Neville’s method pseudo-code on page 113 to produce pseudo-code for computing the coefficients of N n . 

2. * • Modify the Neville’s method Octave code on page 114 to produce octave code for computing the coefficients of N n . 
Test it by computing N 2 interpolating f(x) = e x at x = 0, 1, 2 and comparing your result to that on page 118. 

3. Let /( 0.1) = 0.12, /(0.2) = 0.14, /(0.3) = 0.13, and /(0.4) = 0.15. 

(a) Find the leading coefficient of the polynomial of least degree interpolating these data. 

(b) Suppose, additionally, that /( 0.5) = 0.11. Use your previous work to find the leading coefficient of the polynomial 
of least degree interpolating all of the data. 

4. Find a Newton form of the polynomial of degree at most 3 interpolating the points (1,2), (2,2), (3,0) and (4,0). ^ 

5. Use the method of divided differences to find the at-most-second-degree polynomial interpolating the points (0, 10), 
(30,58), (1029,-32). [A] 

6. Use divided differences to find an interpolating polynomial for the data /( 1) = 0.987, /( 2.2) = —0.123, and /( 3) = 
0.432. [s] 

7. Create a divided differences table for the following data using only pencil and paper. 

/( 1.2) = 2.2 /(1-4) = 2.1 /( 1.6) = 2.3 

(a) What is the interpolating polynomial of degree at most 2? Does it actually have degree 2? 

(b) Write down two distinct linear interpolating polynomials for this data based on your table. 

8. Use divided differences to find the at-most-cubic polynomial of exercise 19 of section 3.2. Does it have the expected 
degree? ^ 

9. Find the degree at most two interpolating polynomial of the form 


p n {x ) = a 0 + ai(x - xo ) + a 2 (x - xo)(x - 

- Xi) 

+ • 

■ ■ + a n (x — *o)(a: — £i) • 

• ' (X - Xn-l 

for the data in the table. 





X 

2 

3 

4 


f(x) 

3 

5 

4 



10. 0 > Use the Octave code from question 2 to compute the interpolating polynomial of at most degree four for the data: 


X 

/(*) 

0.0 

-6.00000 

0.1 

-5.89483 

0.3 

-5.65014 

0.6 

-5.17788 

1.0 

-4.28172 


Then add /(l.l) = —3.9958 to the table, and compute the interpolating polynomial of degree at most 5 using a 
calculator. You may use the Octave code to check your work. ' 

11. 8 = Use the Octave code from question 2 to find interpolating polynomials of degrees (at most) one, two, and three for 
the following data. Approximate /(8.4) using each polynomial. 


/(8.1) = 16.94410, /(8.3) = 17.56492, 
/( 8.6) = 18.50515, /( 8.7) = 18.82091 
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12. Find a bound on the error in using the interpolating polynomial of question 6 to approximate /( 2) assuming that all 
derivatives of / are bounded between —2 and 1 over the interval [1, 3]. ^ 

13. Regarding the polynomial of question 9, 


(a) use the polynomial to approximate /( 2.5); and 

(b) assuming f € C 3 , find a theoretical bound on the error of approximating f(x) on the interval [2,4]. 

14. t A l 

(a) Find an error bound, in terms of (^ 8 . 4 ), for the approximation Ps(8.4) in question 11. 

(b) Find an error bound, in terms of for the approximation Ps(x) in question 11 good for any x £ [8.1, 8.7]. 

(c) Suppose = xcos* — e x for the function /(*) of question 11. Use this information to find an error bound 

for the approximation Ps(x ) good for any x £ [8.1, 8.7]. 

15. Buck spilled coffee on his divided differences table, obscuring several numbers. Nevertheless, there is enough legible 
information to find the at-most-degree-3 polynomial interpolating the data. Find it. ^ 



16. Show that the polynomial interpolating the following data has degree 3. 


X 

-2 

-1 

0 

1 

2 

3 

/O) 

1 

4 

11 

16 

13 

-4 


17. For a function /, Newton’s divided difference formula gives the interpolating polynomial 


Ns(x) = 1 + 4* + Ax(x — 0.25) + — *(* ~ 0.25) (a; — 0.5) 

on the nodes *0 »0,ii = 0.25, X2 = 0.5, X3 = 0.75. Find /( 0.75). ^ 

18. Match the function with its Seeded Sidi method convergence diagram. In each case, Sidi’s 6 th degree method was used. 
The real axis passes through the center of each diagram, and the imaginary axis is represented, but is not necessarily 
centered. ^ 


f(x) = sin a: 

g(x) = s\nx — e~ x 

h{x) = e x + 2~ x + 2 cos 2 ; — 6 

l{x) = 56 - 152* + 140* 2 - 17* 3 - 48* 4 + 9z 5 
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19. Match the function with its Seeded Sidi method convergence diagram. The real axis passes through the center of each 
diagram, and the imaginary axis is represented, but is not necessarily centered. 7 ' 


f(x) = x 4 + 2x 2 + 4 
g(x ) = (x 2 )(\nx) + (x — 3)e x 

h(x ) = 1 + 2x + 3x 2 + 4x 3 + 5x 4 + 6x 5 

l(x) = (lnx)(*‘ ! + 1) 



20. You have found the following Octave function with no comments (boo to the author of the function!). 


function ans = foo(x,y,x0) 
n = length (x) ; 
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ans = 0 ; 
for i=l:n 
a=l ; 

for i = l:n 
if (j — i) 
a=a*y(i) ; 
else 

a=a*(xO-x(j))/(x(i)-x(j)) ; 
endif 
endf or 
ans=ans+a; 
endf or 
endf unction 

What is the output (ans) of the Octave command 

f oo ([1.1, 1.2, 1.3, 1.4], [-78, .81, .79, .75], 1.2) 


and why? 


Answers 

P 2 from /i, 0) fi t i,fo,2- p 2 (x) = + fi,i(x - xi) + fo,2( x ~ x i)( x ~ x 2 ) 

Q 2 two new ways: Q- 2 (x) = / 2 , 0 + fi,i(x - x 2 ) + fi, 2 (x - x 2 )(x - Xi) and Q 2 (x) = / 2 ,o + / 2 ,i(x - x 2 ) + /i, 2 (x - 
x 2 ) (x - x 3 ) 




Numerical Calculus 


4.1 Rudiments of Numerical Calculus 

The basic idea 

g(x) = x — ^ sin(x) has a root between 0 and n. You are trying various methods and become interested in how 
the choice of initial value affects the results. Using Newton’s method, you do some research into how the choice of 
Xg affects x 2 - You run some tests and come up with the following data. 


x 0 

Xl 

93/70 

2.084603181618954 

95/70 

2.055494116570853 

97/70 

2.030278824314539 

99/70 

2.009751835391139 

101/70 

1.993574976724822 

103/70 

1.981091507449763 

105/70 

1.971614474758557 


Using fixed point iteration on /( x) = ^ sin(x), you decide to examine how the choice of x-o affects xio, not X 2 since 
fixed point iteration generally converges slowly. You run some tests on this method and come up with the following 
data. 


x 0 

£10 

1/7 

1.949880891899200 

2/7 

1.951091775564697 

3/7 

1.923339403354019 

4/7 

1.941460911122824 

5/7 

1.960870620285721 

6/7 

1.965674866641883 

1 

1.961228252911260 


In the Newton’s method experiment, X2 is a function of xq, and in the fixed point iteration experiment, X\q is 
a function of xq. So you start to think of them completely independently from the original root-finding question. 
As they sit in their tabular form, they are just two functions for which you know a handful of values and not much 
more. What do these functions look like? Do we have enough information to perhaps find their derivatives, and, 
hence, local extrema? Can we find their antiderivatives? This is the stuff of numerical calculus. We can certainly 
approximate these things. 

In chapter 3 we learned how to approximate functions by interpolation, so we know we can use the tabular data 
to approximate the functions themselves. But what about their derivatives and integrals? Well, polynomials are 
easy to differentiate and integrate. Perhaps we can use the derivatives and integrals of interpolating polynomials 
to approximate the derivatives and integrals of ^(^o) and Xio(xo). Indeed we can! 

In order to avoid the confusion of using x 0 for multiple purposes, we will rename our functions v(x) for a^O^o) 
and tp(x) for xio(xo). Hence, we have ^(93/70) = 2.0846..., ^(95/70) = 2.0554..., and so on. Similarly, we 
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have now p{l/7) = 1.9498..., ip( 2/7) = 1.9510..., and so on. We will also take up the practice of calling the 
^-coordinates of the prescribed interpolation points nodes. Hence, the nodes we have for v are 93/70, 95/70, and 
so on. The nodes we have for p are 1/7, 2/7, and so on. 


Crumpet 24: v and p 


v is the (lower case) thirteenth letter of the Greek alphabet and is pronounced noo. p is the (lower case) twenty- 
first letter of the Greek alphabet and is pronounced fee. The letter fee is also written </, but in mathematics it 
is much more common to see the variant ip, perhaps to avoid confusion between fee and the empty set, 0. The 
capital versions of v and ip are N and 4>, respectively. 


We begin by considering interpolating polynomials on three nodes. For u, we use the nodes 93/70, 99/70, and 
1.5, and get 

P 2 Jx) = ,07673215587088045a; 2 -.07445530457646088a; + 1.95895140161684. 

For ip, we use the nodes 1/7, 4/7, and 1, and get 

P 2 , v (x) = 2.498590686342254a; 2 - 7.726543017101505a; + 7.939599956140455. 


We have added a second subscript to P 2 in order to distinguish the interpolating polynomial for v from that for <p. 
Now we can approximate derivatives and integrals for both v and p using P 2v and P 2 . v , respectively: 



So, for example, 


P^ v {x) = 4.997181372684508a; - 7.726543017101505 
P 2 ^{x) = .1534643117417609a; - .07445530457646088 

J p 2 v dx = ,8328635621140847a; 3 - 3.863271508550753a; 2 + 7.939599956140455a; + C 
J P 2 , v dx = .02557738529029348x 3 - ,03722765228823044a: 2 + 1.95895140161684a: + D. 


and 


^(1-4) « P^(l-4) 

= 4.997181372684508(1.4) - 7.726543017101505 
= -.7304890953431942 

^(0.5) « i^ lV ,(0.5) 

= .1534643117417609(0.5) - .07445530457646088 
= .002276851294419568 



v{x)dx 



ip(x)dx 



P 2 , v (x)dx 


[,8328635621140847x 3 

.1991481658283149 


P 2 ,Lp{x)dx 


[.02557738529029348a; 3 

1.947301134618903. 


3.863271508550753a; 2 + 7.939599956140455x] j® 


,03722765228823044x 2 + 1.95895140161684X], 1 , 


That’s it! This exercise encapsulates the entire strategy. Given some values of an otherwise unknown function, we 
will approximate the unknown function with a polynomial. We will then approximate derivatives and integrals of 


4.1. RUDIMENTS OF NUMERICAL CALCULUS 


129 


Table 4.1: Estimating the derivatives and integrals of v and p. 


quantity 

using P 2 

using P 6 

z/(1.4) 

-.7304890953431942 

-.7178145479410887 

+'(0-5) 

.002276851294419568 

.1447147284558277 

lit v{x)dx 

.1991481658283149 

.1991932206801721 

fo <fi(x)dx 

1.947301134618903 

1.925578216262883 


the unknown function by differentiating and integrating the polynomial. There is very little more to be said about 
the idea. There is, however, a lot more to be said about automation, accuracy, and efficiency, the focus of the rest 
of the chapter. But before we tackle those issues, we will have another look and v and p. 

Using all the nodes of v , and the help of a computer algebra system, we compute the sixth degree interpolating 
polynomial 


P 6 , v (x) = -1342.393417879939a; 6 + 11632.43754466623a; 5 - 41996.4789301455a; 4 

+80851. 91317212582a; 3 - 87536.60487741232a; 2 + 50528.3026241064a; 
-12144.27629915625. 

Using all the nodes of p (and a computer algebra system) we compute the sixth degree interpolating polynomial 

P 6 , v (x) = -25.41848741926543a; 6 + 97.00017832506126a; 5 - 147.1805326076494a; 4 

+111. 7996194440324a; 3 - 43.71110414341027a; 2 + 8.049781257197147a; 
+1.421773396945804. 

Again we have added a second subscript in order to distinguish the interpolating polynomial for v from that for p. 


Now we can get second estimates for z/(1.4), 

p'(0.6), v dx, and fg p dx: 

P(1.4) s 

a Pg ^(1.4) « -.7178145479410887 

¥>'(0.5) s 

a Pg i¥ ,(0.5) « .1729311759579151 

pi. 5 

/■ 1.5 

/ v(x)dx £ 

a / P 6 ^(x)dx sa .1991932206801721 

J 1.4 

f 1 

J 1.4 

r 1 

/ c p(x)dx ? 

Jo 

a / P 6 ^{x)dx « 1.925578216262883. 

Jo 


Table 4.1 summarizes the eight estimates we have made so far. The first four digits of the estimates of J)' ' 5 v(x)dx 
agree, and the first two of fj p(x)dx agree. So there is some agreement for the estimates of the integrals. The 
estimates for the derivatives don’t agree quite as well, however. The estimates for z/(1.4) only agree in their first 
significant digit. They both suggest z/(1.4) sa —.7. But there is essentially no agreement between the estimates of 
<//(0.5). One approximation is more than 60 times the other! Based on this simple analysis, we should have a hard 
time believing either estimate of y/(0.5). And we should only trust the first few digits of the others. We will see 
later that we can use this type of comparison to have the computer decide whether an approximation is good or 
not. 


Issues 

There are three issues with the method of estimating derivatives and integrals just outlined. 

1. Efficiency. For illustrative purposes and understanding the basic concept of numerical calculus, it is a good 
idea to calculate some interpolating polynomials as done in the previous subsection. However, it is cumbersome 
and time-consuming to do so. We will dedicate significant energy into finding shortcuts to this direct method, 
thus making it more efficient and practical. 

2. Automation. Numerical methods are meant to be run by a computer, not a human with a calculator. We 
need to find ways that a computer can handle interpolating polynomials. This issue has intimate ties with 
efficiency. After all, what will make an algorithm efficient is if it can be executed quickly by a computer! 
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3. Accuracy. So far we have done very little to determine how accurate our approximations are. We need to 
get a better handle on the error terms in order to understand how to use the method accurately. 

Presently, we make strides toward addressing all three of these issues, but we leave the bulk of it for the upcoming 
sections. 

In chapter 3, we labeled the nodes of an interpolating function xo, Xi, . . . , x n . It will be beneficial to begin calling 
them xq + OqH, Xq + d\ h, . . . , Xq + 9 n h instead. And for most of our analysis, we will use Xo + 9h instead of x for 
the point at which we desire an estimate. One might call this substitution a change of variables or a recalibration 
of the a;-axis. 

To see how this helps with the analysis, consider the degree at most 2 interpolating polynomial of / with nodes 


xq + 9oh, Xo + 9ih, and Xq + 9 n h. 


In the notation of chapter 3, we have 


D ^ (x-x 1 ){x-x 2 ) ^ , {x-x 0 ){x-x 2 ) s , (x — Xo)(x — X\) 

P2{X) = + (*,-*,)(*, + (z, -*<,)(** 

but with the new notation, we replace Xq by Xq + 9oh , aq by Xq + O-Ji, x 2 by Xq + 9 2 h, and x by Xq + 9h , giving us 

P 2 (x 0 + Oh) = o + 0oh) 


(o 0 -ft) (9 0 -e 2 y 

(0-9o)(9-9 2 ) 

( 9 1 -9o)(6 1 -9 2 ) 
(9-9 0 )(6~9 1 ) 
(0 2 - 9o)(6 2 - 9J 


f(x 0 + 9ih) 
f(x o + 0 2 h). 


(4.1.1) 


For the most part, we have just swapped x for 9 and Xi for 0j. This benign-looking change is actually a huge step 
forward! This formula makes it apparent that the actual values of the Xi are not important. It is only their location 
relative to some base point, Xq, measured by some characteristic length, h, that matters. 9 and the 9i are those 
measures. Essentially this makes Xq the origin and h the unit of measure on the rr-axis. We measure all values by 
how many lengths of h they are from xo- 

To illustrate the benefit, let us assume that we have three nodes, equally spaced, so the least and greatest 
nodes are the same distance from the third, middle node. Setting the central node as the base point, Xq , and the 
characteristic length, h, to the distance from this central node to the others, we can then label them 


xq — h, xq, and xq + h. 


And we have already arrived at the essential point. It doesn’t matter if the set of nodes is {1, 2, 3} or {80, 90, 100} 
or {—4.3, —4.2, —4.1}. In each of these sets, we have three nodes, one of which is the midpoint of the other two. 
Each set of nodes is equal to the set {xo — h,x o,Xq + h} for some values of Xq and h. Hence, if we can do any 
analysis with the set {xq — H,xo,Xq + h}, then we get information about working with any of the sets of nodes 
{1, 2, 3} or {80, 90, 100} or {-4.3, -4.2, -4.1} and so on. 

Back to the set of nodes {xo — h,x o,Xo + h }. For this set of nodes, we have 9 0 = —1, 9\ = 0, and 9 2 = 1. 
Substituting into 4.1.1, 


+ Oh) 


- l) / (* o - A) + + M/(*o + h) 




( 2 )( 1 ) 


(— 1 )(— 2 ) 

9 o 6 f(*o ~h) + ( 1 - 9 2 )f{x 0 ) + —^—.f(xo + h). 


Now this formula can be used to get the interpolating parabola over any set of three equally spaced nodes. 

In an attempt to apply this formula to v, consider the nodes 93/70, 99/70, and 105/70. Since || — || = ^ 
we have a set of nodes of the form {xq — h,xo,Xo + h} with xq = §§ and h = ^ It just so happens that 
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1.4= m- 


99 

70 6 35 > 


, so we use 9 = — | to calculate P 2 i1/ (1.4): 


P 2 .,(1.4) — P 2j „ f — 



Mi) +70^(1) -5^(W) 

72 

7(2.084603181618954) + 70(2.009751835391139) - 5(1.971614474758557) 


72 


= 2.019677477429439. 


This seems a pretty good estimate since it is between z/(93/70) ~ 2.085 and ^(99/70) ~ 2.009 but significantly closer 
to 2.009. After all, 1.4 is between 93/70 « 1.328 and 99/70 ~ 1.414 but significantly closer to 1.414. Equation 3.2.3 
gives us some idea how good we might expect this estimate to be. 

7„C 23 ) + 70!/( 22')_5 i ,C ins') 

But let’s back this calculation up just a couple steps. The constants of the — ' 707 v 70 ' step 


were 


determined purely from the values of 8 and the 0i . And the i , and -4// are just the three nodes, xq — h, Xq , Xq + h, 
so what we really have here is a prescription, or formula, for the value P 2 (xo — \h) for any degree at most 2 
interpolating polynomial over the nodes xq — h, xq, and Xq + h : 


V ( Xq - -h 


1 \ 7v(x 0 - h) + 70v(x o ) - 5u(x 0 + h) 

Pi,„ I *0 - „ ■ 


And there is nothing special about the particular v in this formula either. None of the constants — 7, 70, —5, 

nor 72 is dependent on v , but rather only dependent on the spacing of the nodes. Therefore, given any function /, 
we can extract from this calculation the succinct approximation formula 


/ 


£q ^ j U 


7f{x o -h) + 70/(sq) - 5/(x 0 + h) 
72 


(4.1.2) 


This formula illustrates the real purpose in reframing the values of the a in terms of Xq, h : and the 0i. This way, 
we get formulas applicable to a whole class of nodes, not just one particular set of nodes. 

As for ip, the nodes =, |, and 1 are equally spaced, so the set {|, |, 1} has the form {a; 0 — h,x 0 ,x 0 + h} where 
xo = f and h = |. Not by accident, it happens that | \ ■ f = 0.5, so ^(0.5) = <p{xo — \h) where Xo = | and 

h = |. Now we can use formula 4.1.2 to approximate <^(0.5)! 


<^(0.5) 


P2, v ( 0.5) = 


7tp(x 0 - h) + 70^(a: o ) — 7>p(x 0 + h) 


72 

7(1.9498808918992) + 70(1.941460911122824) - 5(1.96122825291126) 


72 


= 1.94090678829633. 


This time, we have completely circumvented any direct calculation and evaluation of P 2i¥ ,. Formula 4.1.2 allows us 
to calculate P 2 M0.5) directly from the values of <p at the three nodes. No need to calculate, refer back to, evaluate, 
or simplify P 2 , v ! All of that has been done in deriving the formula. Very quick. Very efficient. 

Stencils 

A formula such as 4.1.2 is only applicable to a set of nodes and point of evaluation with the same geometry (relative 
positioning) as those used to derive the formula. Therefore, it will be important to keep track of the geometry used 
to derive such formulas. To that end, we often refer to a particular set of nodes with its corresponding point of 
evaluation as a stencil. For example, the nodes Xq — h, Xq, Xq + h with point of evaluation xq — |/i form a stencil — a 
relative positioning of points that can be scaled (by changing the value of h) and translated (by changing the value 
of Xo). On a number line, this particular stencil looks like 

x o Xn + h 


Xq 


Xq — h 


Xq + h 
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Xq can be located anywhere and h can be any size, even negative. It is this flexibility that makes formulas like 4.1.2 
useful. 

Now let’s suppose we do not have evenly spaced data, but we are interested in a point midway between two 
others. An appropriate three-point stencil would use the nodes xq — h, the leftmost node, Xq + h , the rightmost 
node, Xq + 0\h for some 8\ between —1 and 1 , the middle node, and point of evaluation xq, the point midway 
between the leftmost and rightmost nodes. For 9\ = |, this stencil looks like 

Xq — h X 0 + -h Xq + h 

• e • • > 

xq 

And we can derive a formula for P 2 {xq) based on the values of / at the three nodes. Plugging 9 = 0, 9q = — 1, 
6 >i = A , and 6*2 = 1 into equation 4.1.1, we get 

fjx 0 -h) + 9 f(x 0 + |/i) - 2/Qo + h) 

8 

again a succinct formula applicable to any function /. No need to calculate the interpolating polynomial or evaluate 
it directly for any data that fit this stencil. That part has already been done and simplified. 


Derivatives 

Derivative formulas can be derived likewise. Once derived for a given stencil, they can be used very easily and 
efficiently for other data fitting the same stencil. We now find the formula for the first derivative, P 2 {x 0 — \h), over 
the stencil 


Xq — h 


Xq 


Xq 


Xq - -h 

o 

used earlier. We begin by recognizing that in 4.1.1 a; is a function of 9. In particular, x{9) = Xq + h9, so ^ x(9 ) = h. 
By the chain rule, ^gP 2 ( 8 ) = fk-Piix) ■ ^ \x{9 ) = h^P 2 (x). From equation 4.1.1, we then have 


ax 


h 

(8 - 8 X ) + ( 8 - g 2 ) 

h( 8 0 - 8^(80 - 9 2 ) 

(8 - 80 ) + ( 9 - e 2 ) 


+ 


h( 0 1 - 0 o )( 8 1 - 0 2 ) 

(9 - 8q ) + (8 - 00 


h( 0 2 - 0 O )(02 - 0 i) 
1 


f(x 0 + 8qK) 

f(x 0 + 9i h) 
f(x 0 + d 2 h). 


In particular, when 8q = —1, 9\ = 0, 9 2 = 1, and 8 = — g, we have 


P-2 ( ^0 - jU ) = 


1 _ 7 
6 6 


-f(x 0 - h) 


5 7 

6 6 


H- u y Ml)(-1) 

- 2 f(x 0 - h) + f(x 0 ) + f(x 0 + h) 
3 h 


fix 0 ) 


M2)(l) 


f(x 0 + h) 


(4.1.3) 


(4.1.4) 


We now have a formula for P 2 (xq — g/i) « J'{xq — \Ji) for the stencil with nodes xq — h, xq, xq + h and x = Xq — ^h. 
We can now apply this formula to approximate z/(1.4) and ^(O.S). 

-2Ki) + Ki) + K^f) 


/(1.4) 


3(i) 

—2(2.084603181618954) + 2.009751835391139 + 1.971614474758557) 

9/35 


-.7304890953430477. 
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Notice this is not exactly what we got in table 4.1 for z/(1.4) using P 2 . The two estimates differ in the last few 
digits. This is due to floating-point error affecting the calculations in different ways. Generally there is more error 
in calculating directly from the interpolating polynomial because the data are processed much more heavily. Best 
not to trust the last several digits in either calculation, however. Now 

✓ (0.5) » -V(i)+^) + ^(D 

-2(1.9498808918992) + 1.941460911122824 + 1.96122825291126) 

9/7 

= .002276851294420679. 


Again, this is close to the approximation in table 4.1, but not exactly the same due to different floating-point errors 
for the two calculations. But the point is made. Using a formula based on a stencil is preferable to working directly 
from the interpolating polynomial. It is easier, more efficient, and can be automated. 

Before moving on to integration, we make one more observation. When trying to approximate / using an 
interpolating polynomial, it does not make much sense to consider a stencil like 

Xo + h 

> 



where the point of evaluation is one of the nodes. We know, by definition of P ra , that P n (xi) = f(xi) for each 
node Xi . Hence, the “formula” would be f(xi) = P 2 {xi), and it would be exact, not an approximation. And not 
particularly informative since this is one of the facts from which we calculated P 2 l On the other hand, it does make 
sense to consider such a stencil when trying to approximate derivatives of /. There is no guarantee the derivative 
of P n will agree with the derivative of / anywhere, even at the nodes. Substituting 9g = — 1, 6\ = 0, 0 2 = 1, and 
0 = 0 into 4.1.3, we find 


^ 2 (^ 0 ) = 


1 


M-i)(-2) 


f(x 0 - h) 


1 + (~1) 
Mi)(-i) 


0 ) 


h{ 2)(1) 


f(x 0 + h) 


f(x 0 + h) - f(x 0 - h) 
2 h 


(4.1.5) 


for example. 


Integrals 

For integration formulas, we use a modified stencil. We need the nodes plus the endpoints of integration, which will 
be identified by square brackets, [ for the left endpoint and ] for the right endpoint. But the process is analogous. 
We find a formula for the interpolating polynomial and, in place of integrating the unknown function, we integrate 
the interpolating polynomial. 

Following this procedure, we can derive a formula for the integral of / over the stencil 

xq xq + 2 h xq + 4 h xq + 6/t 

. . . [ • — * > 

Xq T h Xq + 3h Xq + 5 h 


for example. The algebra is straightforward but tedious, so we do not show it here. It is best to use a computer 
algebra system to derive such a formula. The result, an approximation of the integral over [xo + 2.5 h, Xq + 6 h\ using 
nodes xq, xq + h, x 0 + 2 h, Xg + 3 h, xq + 4 h, Xq + 5 h, and Xg + 6h, is 


rXo+6h 


' xo+2.5 h 


f{x)da 


[42056/(x o + 6 h) + 201831/(x o + 5 h) + 63357 f(x 0 + 4 h) 
+195902/(x o + 3/i) - 28518/(x 0 + 2 h) + 10731/(x o + h) - 1519 /(x 0 )] . 


This formula can now be used to approximate J) 1 4 v(x)dx instead of integrating the interpolating polynomial 
directly as done on page 129. You are invited to plug in the appropriate values of v and compare your answer to 
the one in table on page 129. Answer on page 136. 

The stencil for the approximation of (p(x)dx using Pg }V> looks like 
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xq xo + 2 ft xq + 4 ft xq + 6 ft 

xq — h xo + h xo + 3 ft xo + 5 ft 

different from the one we used to approximate J] 1 f v{x)dx. Consequently, the approximation formula is different 
too. We need a formula for the integral over [xo — ft,x o + 6ft.] with nodes Xo, Xq + ft, x o + 2 ft, xo + 3 ft, Xo + 4ft, 
xo + 5 ft, and Xo + 6 ft. The nodes are the same as before, but the interval of integration is different. The result is 

|•xo+&h i 

/ f(x)dx « — — [5257/(x 0 + 6ft) - 5880/(x o + 5ft) + 59829/(x 0 + 4ft) 

•Uo-Ti 8640 

-81536 /(x 0 + 3ft) + 102459/(x o + 2ft) - 50568/(x o + ft) + 30919/(x o )] . (4.1.6) 

Again, a computer algebra system should be used to derive such a formula. You are now invited to plug in the 
appropriate values of ip to approximate J ( j p(x)dx and compare your result to the one in table on page 129. Answer 
on page 136. 

Key Concepts 

node: the abscissa (first coordinate) of a data point used in interpolation. 

polynomial approximation: approximating the value of a function, its derivative or integral based on the cor- 
responding value of an interpolating polynomial. 

stencil: relative positioning of the abscissas used in a polynomial approximation. 


Exercises 

1. Derive an approximation formula for the first derivative 
over the stencil 

x 0 XQ + h 

. e • > 

x 0 + -h 

following these steps. ^ 

(a) Write down Li (x) , the Lagrange form of the inter- 
polating polynomial passing through the points 

(x 0 ,/(x 0 )) and (xi,/(xi)). 

(b) Calculate the derivative L((x). 

(c) Substitute xo + \ h for x and xo + h for xi in your 
formula from (b) and simplify. 

2. Derive an approximation formula for the first derivative 
over the stencil 

xo xo + h 

. e ■ , 

Xo+ - h 

following these steps. 

(a) Write down Li(x{9)) = L i(xo + 9h), the La- 
grange form of the interpolating polynomial pass- 
ing through the points 

(x 0 ,/(x 0 )) and (x 0 + ft, /(x 0 + ft)) 
in terms of 9, ft, and xo- 

(b) Calculate the derivative ^Li(x(9)). Remember, 
x(9) = xo + 9h , and use the chain rule. 


(c) Substitute 9 = i into your formula from (b) and 
simplify. ^ 

3. Derive an approximation formula for the first derivative 
over the stencil 

Xq Xo + h Xq + 2h 

* e ■ > 

Xq + — h 

following these steps. 

(a) Calculate A^x), the Newton form of the interpo- 
lating polynomial passing through the points 

(x 0 ,/(x 0 )), (xi,/(xi)), and (x 2 ,/(x 2 )). 

(b) Calculate the derivative JV^x). 

(c) Substitute xo + ^ ft for x, xo + ft for xi , and xo + 2ft 
for x 2 in your formula from (b) and simplify. ^ 

4. Derive an approximation formula for the second deriva- 
tive over the stencil 

Xo x 0 +h *0 + 2 h 

9 ^ , 

x a +-h 

following these steps. ^ 

(a) Calculate A 2 (x(ft)) = A 2 (x o + 9h), the New- 
ton form of the interpolating polynomial passing 
through the points 

(x 0 ,/(x 0 )), (x 0 + ft, /(x 0 + ft)), 
and (xo + 2 ft, /(x o + 2 ft)) 

in terms of 9, ft, and xo- 

(b) Calculate the derivative j^N 2 {x{9)). Remem- 
ber, x(6) = xo + 9h, and use the chain rule. 
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(c) Substitute 8 = | into your formula from (b) and 
simplify. 


5. Formula 4.1.5 and the formula you got from question 
1 should be different. However, they were derived over 
essentially the same stencil — two nodes with the point 
of evaluation centered between them. Only the labels 
on the stencils were different. In other words, they 
were derived from the same geometry, so, in some sense, 
must be the same. In question 1, xo plays the same role 
as xo — h does in 4.1.5. Moreover, in question 1, the 
distance from the point of evaluation to either node is 
| while in 4.1.5, that distance is h. Make the substitu- 
tion xo for *o — h in 4.1.5. Then make the substitution 
| for the h in the denominator of 4.1.5. With these 
substitutions, formula 4.1.5 should match exactly the 
formula you got in question 1. In other words, different 
labelings in a stencil produce different labelings in the 
associated formula. Nothing more. 

6. Use formula 4.1.6 to approximate the integral. 


(a) 

(b) 


/: 
/> 


sin x dx 


-l 

17 


ru 1 
<C) /, 


-dx ^ 


(d) J (a; 5 — 4) dx 

(e) A 
Jo 


*dx [A1 



!"k/2 

(f) 

/ cos x dx 


J-it/2 

(g) 

r Ux w 

J 1 X 


r 61 

GO 

L ^ ~ x 


7. For each integral in question 6, (i) calculate the inte- 
gral exactly, and (ii) calculate the absolute error in the 
approximation. bh A l 

8. Let f(x) = (x — l) 2 sinx. Use formula 4.1.4 to approx- 
imate /'( 0) using 


(a) h = 1 

(b) h=\ w 

(c) h=\ 

(d) h=\ 

9. Calculate the absolute error in each approximation of 

question 8. Does the error get smaller as h gets smaller? 

[A] 


10. Derive an approximation formula over the stencil 


x 0 


xo + h xq + 2 h 



h 


xo + 3 h 

• > 


(a) for the value of the function. 


(b) for the first derivative. 

(c) for the second derivative. 

(d) for the third derivative. What can you say about 
this formula? 


11. The polynomial p(x) = 3x 4 — 2x 2 + x — 7 is an interpo- 
lating polynomial for /. Use p to approximate 


(a) /(l) 

(b) /( 2) M 

(c) /'(l) 

(d) /'( 2) M 

(e) [ f(x)dx 
Jo 

(f) / f{x)dx [A] 

Jo 

12. The polynomial q(x) = —7x 4 + 3* 2 — x + 4 is an inter- 
polating polynomial for g. Use q to approximate 

(a) s(l) [A1 

( b ) 9( 2) 

(c) 0'(1) [A1 

(d) g'{ 2) 

(e) / g{x)dx [s] 

Jo 

(f) / g(x)dx 
Jo 


13. Use 4.1.3 to find the formula for the first derivative over 
the stencil 

Xo T h 


(a) 

(b) 

(c) 


x 0 


■a 


1 3 L 

x 0 + - h xo + -h 
xg — h xo xo + 2h 


x 0 - h x 0 


xo + 3 h 
xo + 2 h 

-0 * > 


, ! + V7 , 

xo H — h 


(d) 

(e) 

(f) 

(g) 

00 


xo — h 


x 0 + 2 h x 0 + 3 h 


-ii 


xo 

x 0 - h x 0 


xo + 2 h 


x 0 -h Xo 

# *- 


x 0 + 2 h 


xo xo + h 
— ® 

xo xo + h 

-9 * 


xo + 3h 


x 0 + 3h [A] 

* Jt ' 


14. Find a general approximation formula for the integral 
using two nodes by doing the following. 


(a) Write down the (linear) interpolating polynomial 
with nodes xo + 82 h and xo + 83k. 
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(b) Integrate the polynomial over the interval [®o + 
6 0 h, xo + 9ih\. 

(c) Simplify. ^ 

15. Use the general approximation formula you derived in 
question 14 to find an approximation formula over the 
stencil. 

(a) x 0 x 0 + 1 h x 0 + 2 h 

f ^ ] » 

2 4 

(b) Xo Xo + - h Xo + - h Xo + 2 h 


1 4 

(c) x 0 x 0 + -h x 0 + - h xo + 2 h 


(e) 


Xo 

+- 


Xo + h [AJ 

} » 


16. A general three point formula for the first derivative 
using f{x o), f{x o + ah), and fix o + 2 h), a ^ 0 and 
a ^ 2, is given by 


nxo) = h. 


2 + a 


fix o) 


a(2 — q) 

a 


fix o + ah) 


2- a 


fix o + 2h) 


+ Oih 2 ) 


(d) x 0 - | h 

Xo 



h 


xq + 2 h 


Use Taylor expansions of fix o + ah) and fix o + 2 h) to 
derive the given formula. 


Answers 

C + + 2 6 5,/(-)^ 


fxo-h, h fi x )dx: 


1/35 


[ 42056 ( 1 . 971614474758557 ) + 201831 ( 1 . 981091507449763 ) 


138240 

+ 63357 ( 1 . 993574976724822 ) + 195902 ( 2 . 009751835391139 ) 
— 28518 ( 2 . 030278824314539 ) + 10731 ( 2 . 055494116570853 ) 
- 1519 ( 2 . 084603181618954 )] 


1/7 

8640 


[ 5257 ( 1 . 96122825291126 ) - 5880 ( 1 . 965674866641883 ) 


+ 59829 ( 1 . 960870620285721 ) - 81536 ( 1 . 941460911122824 ) 
+ 102459 ( 1 . 923339403354019 ) - 50568 ( 1 . 951091775564697 ) 
+ 30919 ( 1 . 9498808918992 )] 
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4.2 Undetermined Coefficients 

The basic idea 

According to equation 3.2.3, the difference between / and an interpolating polynomial is a multiple of f^ n+1 \^x)- 
In other words, the error in approximating / by the interpolating polynomial P n depends directly on /( Jl + 1 ). But 
f( n+1 \x) is identically zero whenever / is a polynomial of degree less than n + 1. Consequently, (/ — P n ){x) is 
identically zero in this case. At the risk of sounding redundant, this last thought is worthy of repeating. If / is 
any polynomial of degree less than n + 1, then P„, computed for any set of n + 1 nodes, equals / exactly, for all 
x. As a result, derivatives of P n and integrals of P n are not just approximations of the corresponding derivatives 
and integrals of /. They are exact because P n = / for all x. This observation can be used to derive formulas for 
derivatives and integrals without ever computing P„ or its derivatives or integrals! 

All the formulas we have been deriving for approximating derivatives and integrals of the arbitrary function / 
have taken the form 

n 

^2aJ(xi) 

i—0 

where Xq,Xi, . . . ,x n are the nodes of the interpolating polynomial, places where the value of / is known, and the 
a,i are constants resulting from the derivation. The Method of Undetermined Coefficients takes a direct approach 
to calculating the constants a^. Knowing that the “approximation” formula must be exact for all polynomials of 
degree 0, 1, . . . , n, we can create n+ 1 equations in the n + 1 unknowns, ao, a ±, . . . , a n . The solution of the resulting 
system of equations gives the values of the coefficients. 

Derivatives 

We seek an approximation of the k th derivative of / based on knowledge of the values f(x o + 9oh),f(xo + 
9±h ), . . . , f(x o + 9 n h). To be precise, we desire an approximation of the form 

n 

f {k) (x o + 9h)^Y, + W (4-2.1) 

i= 0 

Due to equation 3.2.3, the approximation must be exact for all polynomials of degree n or less. In particular, it 
must be exact for the polynomials Pj(x) = (x — XqY , j = 0, 1, . . . , n. Symbolically, it must be that 

n 

pf\x 0 + Oh) = ^2 a iPj( x 0 + Oih) 

i = 0 

for j = 0, 1, . . . ,n. Notice the approximation has become an ( exact ) equality. Noting that pj(x o + 9ih ) = ((a;o + 
9ih) — XqY = {9ihy, the system of equations becomes 

n 

pf\x 0 + Oh) = a 0 + ^2(0ih) j ai (4.2.2) 

i= 1 

for j = 0, 1, . . . , n. It is the solution of this system that will yield the a,i. 


Crumpet 25: Vandermonde Matrices 


In general, a system of linear equations may have zero, one, or many solutions. However, system 4.2.2 has a 
special form. In each equation, the constants (9ih) 3 form a geometric progression. Such a matrix of coefficients 
is called a Vandermonde matrix, and it is known that as long as the 9i are distinct, this system will have one 
solution. 


To illustrate, suppose we have the stencil 
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xq — h 


© 

x 0 


xo + h 
• > 


and are interested in formulas for both the first and second derivatives of / (at Xo). For this stencil, 9 = 0, 9 0 = — 1, 
9 1 = 0, and 9 2 = 1, so we are looking for formulas of the forms 

f(x 0 ) « a 0 f{x 0 - h) A ai/(x 0 ) + a 2 f(x 0 A h) 
and 

f"(x 0 ) « b 0 f(x 0 -h) + b 1 f(x 0 ) + b 2 f{x 0 + h). 

Each of these formulas must be exact when f = Po, when f = pi, and when / = p 2 . These three requirements give 
three equations in the three unknowns. 

Beginning with the first derivative formula, we detail system 4.2.2 with k = 1 and n = 2: 

Po(xo) = a 0 p 0 (x 0 - h) A aipo(zo) + a 2 p 0 (x 0 A h) 

p'i(x 0 ) = a 0 pi(x 0 - h) A aipi(a;o) + a 2 pi(x 0 + h) 

P 2 {x o) = a 0 p 2 (x 0 - h) + aip 2 (x 0 ) + a 2 p 2 (x 0 + h) 

By definition, po(x) = (x — Xo)° = 1 so p' 0 (xo ) = 0; pi(x) = (x — Xq) 1 = x — Xq so p'\{x o) = 1; and p 2 {x) = (x — Xq) 2 
so p 2 (x) = 2(x — Xq) giving p 2 (x o) = 0. Substituting this information into the equations above, 


0 — ao + di + cl 2 

1 = — hdQ T ha 2 

0 — h uq A h a 2 . 


The system can be solved by substitution, elimination, or computer algebra system. The solution is ao = 
ai = 0, and a 2 = giving the approximation formula 

t ,, ^ fix o + h)~ f{x o - h) 

f (xo) » oh 


just as we got on page 133 in formula 4.1.5. 

The second derivative formula is derived in the same manner. Since the second derivative formula must be exact 
when / = p 0 , when f = p±, and when / = p 2 , the must satisfy 

Po(xo) = b 0 po(x 0 - h) + bipoixo) + b 2 p 0 (x 0 + h) 

Pi(x 0 ) = b 0 pi(x 0 - h) + hp^xo) + b 2 p 1 (x 0 + h) 

P 2 (x o) = b 0 p 2 {x 0 - h) + bip 2 (x 0 ) + b 2 p 2 (x 0 + h), 


system 4.2.2 with k = 2 and n = 2. Notice the right-hand sides are exactly the same as they are for the first 
derivative formula, save the name change from a * to bi. Only the left-hand side changes substantively. Pq(x) = 0 so 
Po(xo) = 0; p'i(x) = 0 so pi(xo) = 0; and p 2 (x) = 2 so p 2 (x o) = 2. Making these substitutions into the equations 
above, 


0 = b 0 + b± + b 2 
0 = — hb 0 A hb 2 
2 = h 2 bo A h 2 b 2 . 


Again, the system can be solved by substitution, elimination, or computer algebra system. The solution is b 0 = 
b 2 = and &i = A, giving the approximation formula 

f(x o A h) - 2f(x 0 ) A f(x o - h) 
h 2 


f"(x o) 
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Integrals 

The idea for estimating integrals is identical to that of estimating derivatives. The mechanics only change nominally. 
Where there were derivatives before, we will have integrals now. We seek an approximation of f f(x)dx based on 
knowledge of the values /(x 0 + 9 0 h), f(x 0 + 9 3 h ), . . . , /(x 0 + 9 n h ): 



n 

x)dx « '^2 / a t f{x o + 9ih). 

i=0 


(4.2.3) 


The approximation will be exact for all polynomials of degree n or less. In particular, it will be exact for Pj(x) = 
(x — XqY , j = 0, 1 Therefore, the system of equations 



n 

0-0 + E( 

i= 1 


j = 0,1,..., n 


(4.2.4) 


must be satisfied by the a,;. 

To illustrate, suppose we have the stencil 


xq xo + 2 h xq + Ah xo + 6 h 

• . . • • • * > 


Xo 


— h xg + h 


Xo 


+ 3 h x Q + 5 h 


For this stencil, a = Xq — h, b = Xq + 6 h, and 9i = ih, i = 0,1,..., 6. Therefore, we will have a system of seven 
equations in the seven unknowns. First, the left-hand sides: 


nb nXQ-\-6h pXQ-\-6h 

/ p 0 {x)dx= p 0 {x)dx = / ldx = (x - x 0 )\l° o th 

Ja j Xq — h j Xq — h 


I = 7 h 


J Xq — h 
fX 0 +G h 


pb pxo+Gh pxo+Oh 

/ pi(x)dx = / pi(x)dx = / (x — Xo)dx = -(x — Xo) 2 

Ja J xq — h J xq — H ^ 


> Xq — h 
nx 0 +6h 


J xq — K 

>XQ-\-6h 


rb /'Xq+QH nXQ+Qh ^ 

/ p 2 (x)dx = / p 2 (x)dx = / (x — x 0 ) 2 dx = -(x — x 0 ) 

J a j Xq — h j Xq — h ^ 

„ X Q ~\~6h 


nb pXQ+bh pXQ -\-6h ^ 

/ p 3 (x)dx = / p 3 (x)dx = / (x — xo) 3 dx = -(x - x 0 ) 4 

Ja J Xq — h j Xq — h ^ 


/ Xq — h 
nXQ +6h 


nb r>XQ + 6 /i pXQ+Qh ^ 

/ Pi(x)dx = / p±(x)dx = / (x - x 0 ) 4 dx = -(x — Xo) 5 

Ja J Xq — h j Xq — h ^ 

rb pxq +6h pXQ -\-6h 

/ p 5 (x)dx = / p 5 (x)dx = / (x - x 0 ) 5 dx = -(x - x 0 ) 6 

Ja j Xq — h j Xq — h ^ 

nb pXQ +6h nXQ-\-6h 2 

/ pe{x)dx = / ps(x)dx = / (x — xo) 6 dx = -(x — x’o)' 

Ja J Xq — h j Xq — h * 


x 0 +6h or 

= - h 2 

, 2 

Xq — h, 

I Xq -\~Gh Ol 7 

x 0 -h 6 

x 0 +6 h 1295 

= ~ k 

Xq — h 

Xq -\-6h llll 

= 

Xq — h ^ 

Xo+6h _ 46655 6 
xo-/, ” “6“ 7 

Xq -\-6h 

= 39991 h 7 . 

Xq — h 
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217, 


Now putting them together with the right-hand sides (and swapping sides): 

6 

^ ] ( 0 ih) o aj = clq + cl\ + CL 2 ~b U3 + CL4 T H - < 3-6 = 7 h 

i = 0 

6 ^ 35 

^ = hd\ T ‘ 2 ho ,2 T 3/10.3 T 4/10,4 T 5/10,5 T 6/105 = -^/i 2 

i = 0 
6 

'S^(0ih) 2 ai = h 2 oi + 4/i 2 a2 + 9/i 2 a3 + 16/i 2 a4 + 25 /i 2 05 + 36/i 2 a6 = ^-/i 3 

2 — 0 

^ ^ 1295 

^ > fo) 3 q, = h 3 d± T Sh 3 d 2 T 27h 3 d 3 T 64h 3 04 T 125h 3 05 T 216/i 3 og = — — — /i 4 

2—0 

6 7777 

y^(^/i) 4 Qi = /i 4 oi + 16/i 4 a2 + 81/i 4 a3 + 256/i 4 a4 + 625/z 4 05 + 1296/i 4 a6 = /i 5 

2=0 

6 46655 

V (Oih) 5 di = /i 5 oi + 32h 5 d2 + 243h 5 a3 + 1024 /i 5 O4 + 3125/i 5 a5 + 777 6/i 5 oe = /i 6 

6 

2=0 

6 

J2(0ih) 6 ai = h 6 a± + 64 /i 6 a 2 + 729/i 6 a 3 + 4096h 6 a 4 + 15625/i 6 a 5 + 46656/i e a 6 = 39991 h 7 


*= o 


The system again may be solved by substitution, elimination, or computer algebra, at least in principle. Not many 
humans have sufficient patience and precision to solve such a system with paper and pencil, though. Trusting a 
computer algebra system, the solution is clq = 3 8 0 6 9 4 0 9 h, a\ = —^fh, a 2 = 3 2 4 8g 5 0 3 h, a 3 = — ^^h, a 4 = 19 8 9 8 q 3 h, 
05 = — ||/i, and a§ = HI |/i giving the approximation formula 


xq + 6 h 


xq — h 


f(x)da 


8640 


[5257/(x 0 + 6 h) - 5880 /(x 0 + 5h) + 59829 /(x 0 + 4 h) - 81536 /(x 0 + 3 h) 
+ 102459 /(x 0 + 2 h) - 50568 f(x 0 + h) + 30919/(x 0 )] 


(4.2.5) 


just as we got on page 134 in formula 4.1.6. 


Practical considerations 

We have used stencils like 


xq — h 


Xq 

e— • — 

Xo —\h 

6 


Xo + h 
> > 


and 


Xo Xo + 2 h Xq + 4/l Xo + 6 h 

• * * * * * * > 

Xq - h Xq + h X 0 + 3 h Xq + 5 h 

not because the results are particularly helpful, but rather to (a) illustrate the methods and (b) emphasize that these 
methods work in general for any stencil you may dream up. Most of the differentiation and integration formulas 
presented in numerical analysis sources stick to a small host of regularly spaced stencils where, for derivatives the 
point of evaluation is a node, and for integrals, all the nodes lie between the endpoints or there are nodes at both 
endpoints. It is possible the regularly-spaced stencils are all you will ever need, but it is good to know that you can 
derive appropriate formulas for more unusual stencils should the need arise. 

As for their derivation, the main advantage of the method of undetermined coefficients over working directly 
with interpolating polynomials is the ease of automation and lessening of the necessary and often laborious algebra 
needed. In the method of undetermined coefficients, the only polynomials that need to be differentiated or integrated 
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are the polynomials Pj = (x—Xq ) 3 , a much simpler task than integrating or differentiating interpolating polynomials. 
Formulas with up to three or four nodes can be handled this way with pencil and paper. The trade-off is the necessity 
of solving a system of equations, again a simpler task than differentiating and simplifying interpolating polynomials 
of degree 3 or 4. As a final benefit to the method of undetermined coefficients, it is a general solution technique 
used not only in numerical analysis for deriving calculus approximations, but in other studies as well, particularly 
differential equations. The method is applicable whenever the form of a solution or formula is known, but the 
constants (coefficients) remain a mystery. 


Crumpet 26: Undetermined Coefficients in Differential Equations 


In differential equations, we know that a particular solution of the equation 

y — 2y + 3y" = 5 sin x (4.2.6) 

has the form y = A sin x+B cos *, but we do not immediately know the values of A and B. They are undetermined 
coefficients (at this point). They are determined by substituting the known form into the equation being solved. 

y = A cos x — B sin x 
y" = —A sin x — B cos x 

So the equation being solved becomes 

(. A sin x + B cos *) — 2(Acos x — B sin*) + 3(— A sin x — B cos x) = 5 sin x. 

Collecting the coefficients of sin x and cos x on the left side, 

(—2 A + 2 B) sin x + (—2 A — 2 B) cos x = 5 sin x. 

We now match coefficients on left and right sides to get the system of equations 

-2A + 2B = 5 

-2A-2B = 0 

whose solution is A = — | and B = |. Therefore, y = — | sin* + | cos* solves equation 4.2.6. 

Conceptually, this process is no different from the method of undetermined coefficients used in deriving 
numerical calculus formulas. The solution to some problem is known, save for some (undetermined) coefficients. 
The parameters of the problem require the coefficients to satisfy some system of linear equations. The system is 
solved, and the solution to the original problem is consequently known completely, coefficients determined. 


When we get involved with stencils with more than 3 or 4 nodes, solving the resulting (relatively large) system of 
linear equations by hand is not a task to which most of us would look forward. However, it is a standard calculation 
any computer algebra system can do easily and efficiently. Yes, it is advisable to use a computer algebra system to 
derive formulas as complicated as 4.1.6. We have used Maxima 1 to handle or double check a number of the more 
tedious calculations presented in this text. 


Crumpet 27: wxMaxima 


The best way to solve a large system of linear equations is with the aid of a computer algebra system. Figure 
4.2.1 shows how wxMaxima may be used to derive formula 4.2.5. 

Notice the similarities between Maxima code and Octave code. Maxima allows for statements, print state- 
ments, variable assignments, arrays, and suppression of output. The syntax for these things is not the same, but 


^ee http : //maxima. sourcef orge . net/ 
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Figure 4.2.1: wxMaxima deriving an integration formula 


o 1 wxMaxima 13.04.2 [integralFormula.wxm] 

File Edit Cell Maxima Equations Algebra Calculus Simplify Plot Numeric Help 

QSI £ " X GS * O 




(%il) a : x0-h$ 

b:x0+6*h$ 

p:makelist ( (x-x0)~j , j ,0,6) $ 
for j : 1 thru 7 do ( 

eq[j ] : rat simp (integrated [ j ] ,x,a,b)) = sum(c [i]*subst (x=x0+(i-l)*h,p[ j ]) ,i,l,7) 

)$ 

eqs: [eq [ 1 ] ,eq[2] ,eq[3] ,eq[4] ,eq[5] ,eq[6] ,eq[7] ] ; 
vars: [c [1] ,c[2] ,c [3] ,c [4] ,c [5] ,c [6] ,c[7] ]$ 
soln: solve (eqs, vars) ; 

approx: fact or (subst (soln, sum (c [i]*f (x0+(i-l)*h) , i, 1,7) ) ) $ 
print ( 1 integrated (x) ,x,a,b) , , approx) $ 

(%o5) [7 h=c 7 +c 6 +c 5 +c A +c 3 +c 2 +c 1 ,^-=6 c 7 h + 5 c 6 h + 4 c 5 h + 3 c A h + 2 c 3 h+c 2 h, 217 h =36 


-=216 c 7 h°+125 c fi h °+ 64 c s 7T+27 c 4 h 3 +8 J 


+3125 c 6 /i 5 + 1024 c 5 /) 5 +243 c 4 /) 5 +32 c 3 h 5 +c 2 h 5 , 39991 h 7 = 46656 c 7 h 6 +15625 c 6 /) 6 +4096 c 5 h 6 + 
729 c 4 fi®+64 c 3 h 6 +c 2 h 6 ] 

rr 30919 h _ 2107 h _34153h 1274/1 _ 19943/1 49 h _ 5257/1,, 

Uo7) [ [C 1 8640 , c 2 360 . c 3 2880 ' C/ > 135 ' Cs 2880 ' C ® T2 ' C? 8640 

/» xQ+6 h 

/ f (x)dx ~ (7 h (751 f (x0+6 h)-840 f (x0+5 h)+8547 f(x0+4 h)-11648 f(x0+3 h) + 14637 
J xO-h 

f (x0+2 /r)-7224 f (x0+/i)+4417 f (x0)j ; / 8640 



Welcome bo wxMaxima 

Ready For user inpub 



the principles behind them are. Once you have learned how to do these things in one language, learning how to 
do them in another is usually straightforward. 

Also notice the main difference between Maxima and Octave. Maxima was designed for symbolic manipulation 
while Octave was designed for numerical computation. Octave can be made to do symbolic calculation and 
Maxima can be made to do numerical computation, but the old carpenter’s adage “use the right tool for the 
job” is worth consideration. Maxima is much more adept at symbolic manipulation than is Octave, and Octave 
is much more adept at number crunching than is Maxima. 


Reference 

http : / / andre jv . github . io/wxmaxima/ 


It is unusual to use stencils with more than five nodes anyway. It is not because the formulas for more nodes 
are significantly more complicated or difficult to use, however. As evidenced by formula 3.2.3, the error term for 
an interpolating polynomial involves higher and higher derivatives of / as more nodes are added. This is generally 
fine as long as / has sufficiently many derivatives and the values of the high derivatives are not prohibitively 
large. However, numerical methods are often employed when the smoothness of / is known to be limited, the high 
derivatives are known to be large, or the properties of its derivatives are unknown completely. For these functions, 
stencils with fewer nodes, which give rise to formulas with lower order error terms, are often more accurate , not 
less. And in the case of unknown smoothness, the lower order methods have a better chance of being accurate. 

As a final note, some care must be taken not to ask too much of a derivative formula. With n + 1 nodes, the error 
term for the interpolating polynomial involves f^ n+1 \ so there is no hope of using these nodes to estimate /f n+1 ) 
or any higher derivatives at any point. If you, however, forget this fact, it shows up in a direct way in the method 
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of undetermined coefficients. If k > n, then the system of equations with undetermined coefficients becomes 

n 

5 ~2(0ih) j ai = 0, j = 0, 1, . . . , n 

2 = 0 

because the k th derivative of pj is identically 0 for all j < n < k. The only solution to this system is ao = ai = 
■ ■ ■ = a n = 0 giving the “approximation” formula 

f< k >(x o + 0h)= 0. 

Indeed, this is exact for all polynomials of degree n or less. However, the error in using this formula is exactly 
f ^ k \ xq + Oh), a relative error of exactly 1, making it completely useless. 


Stability 

In Experiment 2 on page 3, section 1.1, we took a brief look at approximating the first derivative of f(x) = sin 2 
using the fact that 

f'(l) = lim sin ( 1 + h ) - sin ( 1 ~ h ) 
h—¥ 0 2 h 

The conclusion we drew was that this computation was highly susceptible to floating-point error. If calculations 
are done exactly, then we expect sin ( 1 + /t U sin ( 1 ~ /i ) t 0 approximate /'( 1) better and better as h becomes smaller and 
smaller. Not so for floating-point calculations, as the experiment revealed. There was a point at which making 
h smaller made the approximation worse! And this example is not unique. This problem always arises when 
approximating f using the centered difference formula 


/'(*) 


f(x + h)~ f(x - h) 
2 h 


(4.2.7) 


But how can we predict at what value of h that might happen without comparing our results to the exact value of 
the derivative? After all, numerical differentiation is employed most often when the exact formula for the derivative 
is unknown or prohibitively difficult to compute. 

Suppose / can be computed to near machine precision. In typical floating point calculations, including Octave, 
that means a relative floating-point error of approximately 10 -15 or absolute floating-point error £f ~ 10 -15 |/(:r)|. 
Since we assume h is small, we can approximate both | f(x + h) — f{x + h ) | and \f(x — h) — f(x — h)\ by £f giving 
an absolute error of approximately 2 £f in calculating the numerator f(x + h) — f(x — h). Assuming h is calculated 
exactly, we have the absolute error 


£r = | f'(x) - /'Or) I 


2 £/ 

2 h 


£/ 

h 


1/0*01 I 

10 15 ' h‘ 


(4.2.8) 


As we will see shortly, the algorithmic error, e a , is caused by truncation and equals 
near x. Since / is near x, we approximate /"'(/) by f"{x) and conclude that 




for some value of £ 


«.-»>. (4.2.9, 

6 

We now minimize the value of £ r + £ a by setting its derivative (with respect to h ) equal to zero and solving the 
resulting equation: 


0 = 


d 

dh 


( £r + £ a) 


l/ W Qr)l , 

3 

h 3 


d (\m\ 1 \f"(x)\ 2 

dh \ 10 15 h 6 

1/0*01 1 LTM , 

10 15 h 2 3 


\f ( x )\ . J_ 

10 15 h 2 
\f( x )\ 3 

\f"'{x) | ‘ 10 15 


3|/(aQ| 

I/"' 0*0 1 ' 


io -5 . 


h 
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For Experiment 2 on page 3, this means we should expect the optimal value of h to be around y -10 5 ~ 

1.44(10) -5 . We reproduce the table from Experiment 2 here with the addition of a third column, the actual absolute 
error: 


h 

p*(h) 


10" 2 

0.5402933008747335 

9.00(10)- B 

10" 3 

0.5403022158176896 

9.00(10)-® 

10- 4 

0.5403023049677103 

9.00(10)- 1Q 

10" 5 

0.5403023058569989 

l.ll(lO)- 11 

10" 6 

0.5403023058958567 

2.77(10) -11 

10- 7 

0.5403023056738121 

1.94(10)" 10 


Indeed, when h = 10 5 , we get our best results! However, the prediction of the optimal value of h was based on 
knowledge of f " , something we generally will not be able to do. Unless we happen to know that is far from 

1, we assume it is reasonably close to 1, in which case the optimal value of h is around 10 -5 . Similar estimates can 
be made for other derivative formulas. 

Because numerical differentiation is so sensitive to floating-point error, we say that it is unstable. The root 
finding methods and numerical integration we have discussed are all stable methods. Their sensitivity to floating- 
point error is commensurate with that of calculating /. 

Key Concepts 

undetermined coefficients: A method for solving problems in which the solution is known save for a set of 
(undetermined) coefficients. 


Exercises 

1. Using the method of undetermined coefficients, derive 
an approximation formula for the first derivative over 
the stencil. 


Xq + h Xq + 2 h 


(g) 


%o Xu + h 

- 0 ■ i> 


x 0 + -h 


(a) 

(b) g 

x 0 + -h 

, \ x 0 x a + h 

(c) . . 

(d) 2 c 


x 0 + h [A] 

• > 


-© > 


3 . 
x 0 + -h 


x 0 + h 


, 3, 
x Q + -h 


/ \ xo xo + h xq + 2 h 

(e) . q . . 

x 0 +^h 


/ r\ Xq h X 0 Xq + h 

V r J © . ^ 


Xq — h Xq 


X'o + 2 h 

-e • > 


i 1 + v'C 

xq H h 


/, N XQ - h Xq + 2 h 

(h) © ^ . > 

xo + 3/i 

/ . \ Xq - h Xq Xq + h Xq + 2h 

w © • • ^ > 


(j) 

(k) 

(l) 

x 0 + h £o + 3 h 

2. Using the method of undetermined coefficients, derive 
an approximation formula for the second derivative 
over the stencil. 

Co Xq + h Xo + 2 h 


_ 

x a + -h 

3 L 
xq + -h 

> 

Xq - h X 0 

xo + 2h 



^ ! + V7 , 

Xo+ 3 ft 


Xq - h Xq 

• © 

xo + 2 h 

[A] 

> 


(a) 

(b) 

(c) 

(d) 

(e) 

(f) 

(g) 


• 

1 

x°+- 

h 

• > 

Xq — h 

Xq 

xo + h 




Xq ~ h 

Xq 

Xq + 2 h 


x 0 + h 

Xq — h 


Xq + 2 h 



xo + 3 h 

Xq — h 

Xq 

Xq + h Xq + 2 h 

> 




Xq 

Xq + h 

xo + 2 h 

1 

X 0 +- 

h 

3 , 

Xq + — ri 

Xo - ft 

Xq 

Xq + 2 h 

© • > 


[A] 


^ i+ 

Xq H — ri 


3 
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, , so - h So So + 2 h m 

W • # * • • > 

so + h So + 3 h 

3. Use the method of undetermined coefficients to derive 
an approximation formula over the stencil 

xo xo + h xo + 2 h Xo + 3 h 

• ■ O ■ • > 

3 , 

x 0 + -h 

(a) for the value of the function. 

(b) for the first derivative. 

(c) for the second derivative. 

(d) for the third derivative. What can you say about 
this formula? 

(e) compare the method of undetermined coefficients 
to the direct method employed in question 10 of 
section 4.1. 

4. Use the method of undetermined coefficients to derive 
an approximation formula for the integral over the sten- 
cil. 


(a) 

Xo 

4 

xo + -h 

xq + 2 h 



l 


J 


(b) 

Xq 

2 L. 

Xq + - h 

xo + 2 h 

[S] 


l 


J 


(c) 

Xo 

S0+ 

xq + h 


(d) 

l 

Xo 

i 


J 

x 0 + h 

] 

[A] 


f ' ] 

2 4 

(f) s 0 so + - h s 0 + - h s 0 + 2 h 


, , 1 4 

(g) so — - /i x 0 +-h x 0 + 2h 

1 ] > 

So 

, I > so so + h [A] 


(i) so x 0 +-h x 0 + 2h 

{ ^ ) > 

(j) So So + 5 h SO +\h so + 2 h ( A 1 

f 1 > 

(k) x 0 -\h s 0 + ^ h s 0 + 2h 

O O 

* f * ] » 

Xo 

/i\ XQ xo h xo 2 h [s] 

U r • } » 

2 4 

(m) x 0 xq + -h x 0 + - h xo + 2 h 

f i * 

5. Using the method of undetermined coefficients, find a 

nxQ + Oih 

general approximation formula for / f(x)dx us- 

J xq+QqH 

ing the two nodes xo + Qi h and xq + d^h. 
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4.3 Error Analysis 

Errors for first derivative formulas 

In section 3.2, we found that if / has sufficient derivatives, then / and P n , an interpolating polynomial of degree 
at most n , differ according to equation 3.2.3 on page 107, copied here for convenience: 

f(n+l)(£ \ 

f(x) - P„(x) = — ,, (x - X 0 )(x - Xi) ■ ■ ■ (x - X n ). 

(n + lj! 

We can use this formula to derive a concise formula for the error in approximating f'(x) by P/( x). 

As done in section 3.2, suppose n > 1 and xq, xi,...,x n are n distinct real numbers. Set w(x) = (x — xo)(x — 
Xi) ■ ■ ■ (x — x n ), a = min(xo, ■ ■ ■ , x n , x), and b = max(a,’o, . . . , x n , x). We know from equation 3.2.3 that, assuming 
/ has n+ 1 derivatives on (a, b) and /', /", . . . , are all continuous on [a, 6], for each x G [a, 6], 

f{n+l)(p \ 

f(x) ~ Pn(x) = —j W(x) 


(n + 1)! 


for some G (a, b). Hence, 


f(x) - P' n {x) = 


dx 


f in+1) (U 

(n+1)! 


w{x) + w (x). 

(n + 1)! 


Since w vanishes at each node, this formula simplifies nicely when x is a node. Without loss of generality, we 
evaluate for x = Xq and get 

/W+iUy ) 

fix o) - P' n {x o) = (n + 1) T ^(xo). 

From here on, the error formula is only valid at a node! This last expression can be simplified further by noting 
that 


w'{x) = Y n(‘ T “ x o) = Y Pi i x ^ 


i— 0 j — 0 


i = 0 


where pt is as defined for equation 3.2.2 on page 106. But pi(x o) = 0 for all i except i = 0, so 

w'(xo) = Po(x 0 ) = (x 0 - X!)(x 0 - x 2 ) • • • (x 0 - Xn). 
Substituting this expression for w', we have the first derivative error formula 


/'(x 0 ) - i*(x 0 ) = 


f {n+1) (U ) 


(x 0 - Xi)(x 0 - X 2 ) • • • (x 0 - x n ). 


(n + 1)! 

Making the substitutions Xq + 9 for x, : , i = 1,2 , ,n, to get a formula in terms of h and the 0^ 

f(n+ 1) ft ) 

fix o) - P' n (xo) = 1 , ,,?r (-0lh)(-0 2 h) ■ ■ ■ (~e n h). 


(n + 1)! 


This error formula simplifies just a bit: 


/'(X 0 ) - P' (X 0 ) = \ , ^ 0102 • • • 0 n (~h) n . 


For the stencil 


-1 


(n + 1)! 

1 2 


(4.3.1) 


n = 4, 6\ = —1, 02 = 1, 03 = 2, and 04 = 3, so the error in calculating f over this stencil is 


/ (5) (0 

120 


(-1)(1)(2)(3)(-M 4 = — 


f^), 

20 
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Error terms for the first derivative over other stencils are computed similarly as long as the derivative is evaluated 
at a node. Table 4.2 summarizes some common first derivative formulas, including error terms. 

Notice that the error term contains (xo — Xi)(xo — X 2 ) • • • (xo — x n ), the product of the differences between the 
point of evaluation and all other nodes, as a factor. When the differences between the point of evaluation and 
the other nodes is small, the product is small. Consequently, first derivative approximation formulas are generally 
more accurate when the point of evaluation is centrally located among the nodes. Hence, we might expect a first 
derivative formula involving nodes Xo < Xi < X 2 to be more accurate when the point of evaluation is Xi rather 
than when the point of evaluation is Xq or X 2 . The same can be said about higher derivative formulas. The more 
centrally located the point of evaluation, the more accurate the approximation. 


Errors for other formulas 


It is tempting to think we can simply repeat the procedure we used with first derivatives, taking the second 
derivative of f(x) — P n (x) = 1 ( n +i)\ w ( x ) t° find the error for second derivative estimates, and the third derivative 

u(x) to find the error for third derivative estimates, and so on. Alas, the matter is 
not so simple. Higher derivatives of /(x) — P„(x) = 1 ( n+ i)\ w i x ) involve derivatives of the factor 1 ( n+ iy* which 


_ Mi (C) . 

(n-t 

do not vanish even when x is a node. Since is entirely unknown, so are its derivatives, making this approach 
unworkable. Other methods for producing precise bounds for certain higher derivative formulas or certain integral 
formulas are limited in scope. 

There is, however, a general method for determining good enough error terms for any derivative or integral 
formula. We replace each evaluation of / in the approximation by a Taylor series expanded about Xo and simplify. 
This gives an expression for the approximation in terms of /(x 0 ), /'(x 0 ), /"(x 0 ), and so on. We compare it to 
the Taylor series representation of the quantity being estimated. The difference between the two is the error. In 
summary, that’s it. Making a rigorous argument of this method takes some care and is worthy of an example. We 
demonstrate the method for the approximation of the first derivative over the stencil 


Xq — h 


x 0 


Xq 


x o - X h 
o 


Again, we choose this stencil not because the stencil is generally useful, but rather to emphasize that the method is 
generally useful. 

In subsection 4.1 on page 132, we derived the approximation 

t , ( 1 _ -2/(x 0 -h) + f(x 0 ) + /(x 0 + h) 

{ (so - gkj = jfc (4.3.2) 

The left hand side, the quantity being approximated, as a Taylor series looks like 

/' (x 0 - = /'(xo) - jU/"(xo) + ^h 2 f"\x 0) - y^/i 3 / (4 ) (x 0 ) + • • • . 

The terms of the right hand side, the approximation, as Taylor series look like 

/(x 0 — h) = /(x 0 ) - hf(x 0 ) + ^h 2 f"(xo) - jU 3 /'"(x 0 ) + ^-/i 4 / (4) (x 0 ) 

f(x 0 ) = f(x 0 ) 

/(x 0 + h) = /(x 0 ) + hf(x 0 ) + i/i 2 /"(x 0 ) + ~ h? f 1,1 (x 0 ) + ^-/i 4 / (4) (x 0 ) H . 

We now substitute these Taylor series into the right hand side of 4.3.2 and simplify. To facilitate the algebra, we 
begin by summing -2/(x 0 - h) + /(x 0 ) + /(x 0 + h): 

— 2/(x 0 -h) = — 2/(x 0 ) + 2 hf(x 0 ) - h 2 f"{x 0 ) + |/i 3 /'"(x 0 ) - ^/i 4 / (4) (a; 0 ) 

f{x 0 ) = /(x 0 ) 

/(x o + h) = f (x 0 ) + hf(x 0 ) + \h 2 f"{x 0 ) + |/t 3 /'"(x 0 ) + A/H/ (4) (x 0 ) H 

-2/(x 0 -h) + f(x 0 ) + /(x 0 + h) = 3 hf(x 0 ) - \h 2 f"(x 0 ) + i/i 3 /'"(x 0 ) - ^/i 4 / (4) (x 0 ) H . 
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Hence, we have 


-2/Oo - ft) + f(x o) + f(x o + ft) _ 3hf'(xo) - |/i 2 /"(a:o) + \h 3 f"{x 0 ) - ^ft 4 / (4) (x 0 ) 4 

3ft ~ 3 ft 

= fix 0 ) - jU/"(x 0 ) + ^h 2 f"'(xo) - ^ft 3 / (4) (x 0 ) 4 . 

For the error, e(ft) = /' (x 0 — gft) — -RUo-C+iUoR./Uo+C ^ we then g e t 

- l h f'(x o) + ^h 2 f"\x o) - j^ft 3 / ( 4 ) (x 0 ) + • • • ^ 

- (/'(so) - ^hf"(xo) + ^h 2 f"(x 0 ) - ^ft 3 / (4) (x 0 ) 4- • • • ^ 

We now know that we have an error of the form 0(h 2 f" (£h))> the form of the remaining term with least degree, 
but we do not have rigorous proof of that fact. Think of what has been done so far as discovery. Now that we know 
the f" terms do not cancel, we go back and truncate all the Taylor series after the f" terms, replacing higher order 
derivatives with an error term, and “redo” the algebra. We thus have 



fix o - h) 
fix o) 
fix o + ft) 


fix o) - \hf"ix 0 ) + y^f’% i) 

fix o) - hfix 0 ) + \h 2 f"{x 0 ) - l -h 3 f'"{b) 

fix o) 

fi x o) + hf\x q) + -h~f"i x 0 ) 4- -ft 3 /"' (£ 3 ) 


where £ (xo — |ft, xo), £2 £ (xo — h, xo), and £3 £ (xo, Xo 4- h). And now when we compute e(ft) = f (xo — \h) — 
- 2 f(i 0 ->i)+/(io)+/(io+ii) ; we ] tnow a n the terms involving /, f, and f" vanish. The only terms left are those 
involving f": 


eih) 


1 

72 

1 

72 

If 

9 


h 2 -2(-i m&)) + im&) 

hf (a) 3ft 

/i 2 r(6) - ^ 2 r'(? 2 ) - ^ft 2 r(6) 

|/ ,,, (€l) - /'"(&) - ^/'"( 6 ) . 


The final formality is that of converting this expression into big-oh notation: 


|e(ft)| = 


\f"'iti) fib) - l fit 3 ) 


< — 
- 9 




ir(6)i 


\f"ib) 


< y -y max{|.r(6)|,|/'"(6)U.r(6)|} 

= ft 2 -Mir(a)i 

for some ^ € (xq — ft, Xq 4- ft) and M = ^ | (the value of ^ is £ 1 , £ 2 , or £ 3 ). We conclude 


e(ft.) = 0 (ft 2 /"'(a)). 


In general, £/, is guaranteed to be between the least node and the greatest node. In the case of an integral 
approximation, the endpoints of integration are treated as nodes for the purpose of locating 
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Gaussian quadrature 

Ultimately, the accuracy of a numerical calculus formula is measured by its error term, a quantity having the form 
0(h n f( k \£h)) ■ If we are interested in the rate of convergence, we consider n, the power of h appearing in the error 
term. The greater the power, the speedier the convergence. However, if we are interested in the largest class of 
polynomials for which the formula is exact, we need to consider the value k, the order of the derivative appearing 
in the error term. The greater k is, the larger the class of polynomials for which the formula is exact. In fact, if the 
error term contains a factor of /^(G)> then the formula is exact for all polynomials up to (and including) degree 
k — 1. The further implication is that there are degree k polynomials for which the formula is not exact, for if this 
were not the case, then the error term would involve a higher derivative. We call the value k — 1 the degree of 
precision. Formally, the degree of precision of a numerical calculus formula is the integer m such that the formula 
is exact for all polynomials of degree up to and including m but is not exact for all polynomials of degree in + 1. 
Gaussian quadrature formulas aim to maximize the degree of precision for integral formulas. 

The numerical derivatives and integrals over a stencil with n + 1 points that we have derived so far are exact 
for all polynomials up to degree n as they must be. They have degree of precision at least n. As it turns out, a 
select few have degree of precision greater than n. Consider the second derivative approximation over the stencil 

-10 1 

• e * >. 

The stencil has three points, so we expect it to be exact for all polynomials up to degree 2 (and it is). However, its 
error term is 0(ft 2 /^ 4 ^(G))> indicating that the formula is exact for all polynomials up to degree 3. The degree of 
precision is actually 3, not 2. The first derivative formula over the same stencil is similar. Though it has an error 
term of y-/ ,,, (G); indicating that the formula has degree of precision 2 as expected, the formula itself only involves 
two of the three points available! The coefficient of f(x o) turns out to be zero. It follows that we can derive the 
same formula using the stencil 


-1 0 1 

• e * >, 

having only two points yet having degree of precision 2. Several other centered differences have this attribute. The 
Newton-Cotes formulas with an odd number of nodes also have this property. Their error terms exceed degree of 
precision expectations by one degree. We noted earlier that a centrally located point of evaluation tends to increase 
accuracy, and now we see that the increase can be dramatic. 

What we might gather from these observations is that it is not only the number of nodes that determines the 
error term of a numerical calculus formula. The location of the nodes is also important. Up to now, we have only 
seen how node location affects derivative approximation. We know that centrally locating the point of evaluation 
generally increases accuracy. We now take up the question of how to locate nodes in order to increase the accuracy 
of integral formulas. The idea of a centralized point of evaluation has no meaning in this context, however. Integrals 
do not have a single point of evaluation. They are taken over an interval. It is the locations of the nodes relative 
to the endpoints of evaluation that are important. We now find out where to put the nodes to attain the greatest 
degree of precision for any given number of nodes. 

Let G n be the n th Legendre polynomial, defined recursively by 

(2 n + 1 )xG n (x) — nG n -i(x) 
n + 1 

1 

x. 

We set the 0i equal to the roots of G n to derive the n-point quadrature formula over the interval [xo — h,x o + h ] 
with greatest degree of precision possible. With placement of the nodes chosen, we force the formula to be exact 
for polynomials up to degree n — 1 as we did earlier. The difference this time is, due to the particular values of 9i , 
the resulting formula will be exact for all polynomials up to degree 2n — 1. When the nodes are placed at the roots 
of the n th Legendre polynomial, we get a quadrature formula for f{x)dx that exceeds the expected degree of 

precision by n, the number of nodes! 

We demonstrate for n = 1 and n = 3. 


G n +i{x) — 
G 0 (x) = 
Gi(x) = 


G i(x) = x 
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has for its only root, 0. Hence, we seek a formula of the form 

rx 0 + h 


/ f{x)dx « a 0 f(x o) 

J XQ—h 

which is exact for polynomials up to degree 0. The one equation for the one unknown, ao, is 

rx 0 + h 


'XQ — h 


(l)dx = a 0 (l) 


or 2 h = ao . Hence, we have 


f-xo+h 


/ XQ — h 


f(x)dx « 2 hf(x 0 ), 


which we claim has degree of precision 1, not 0. Indeed, for f{x) = x — Xq, 


xo+h 


XQ — h 


and 


rXQ-\ -n ^ 

/ f(x)dx = -(x- x 0 ) 

J XQ — h " 

2hf(x 0 ) = 2 h(xo - x 0 ) = 0, 


= 0 


so it is exact for degree one polynomials. However, for f(x) = (x — Xq) 2 , 


/•x 0 +h ^ 

/ f{x)dx= -(x-x 0 Y 

J XQ — h ^ 


XQ+h 


XQ—h 


= -h 3 
3 


and 


2hf(x 0 ) = 2 h(x 0 - x 0 ) 2 = 0, 

.ynomials. Therefore, its de 

rXQ + fl 

/ f(x)dx k, 2hf(xo) is equivalent to the Midpoint Rule as found in Table 4.5. 
J XQ—h 

3xGi(x) — Gq(x) 


so it is not exact for all degree two polynomials. Therefore, its degree of precision is 1. Note the formula 

fx 0 +h 

/ XQ — h 

Now 


G2{x ) — 


= -(3a; 2 — !) 


SO 


g 3 (x) = 


5xG2{x) — 2G±(x) 


|(3x 3 — x) — 2x 


5(3.t 3 — x) — 4x 

6 

15s 3 — 9a: 

6 

= ^(5a: 3 -3x), 


which has roots — \ 0, \ Hence, we seek a formula of the form 


J f{x)dx « aof ^a: 0 - + aif(x 0 ) + a 2 f ^a; 0 + \f^hj 
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which is exact for polynomials up to degree 2. The three equations for the three unknowns are 

fXo+h 


rxo +n 

I (1 )dx — 2 h — Og A &1 A a 2 

J xq — h 

rxo+h ' 

/ (x — xo)dx = 0 = 

J Xq — h 

f'XQ+h O O 0 

/ (x - x 0 ) 2 dx = -h 3 = -h 2 a 0 + -h 2 a 2 . 

Jxn-h 3 5 5 


— \ -ha 0 + \ -ha 


The solution is 

so the quadrature formula is 

px 0 +h 
J Xq — h 


ao = a 2 = -h and a\ = -h, 


f{x)dx : 


5/ [ Xq - -h I + 8 /(x 0 ) A 5/ I x 0 


The formula was derived to be exact for polynomials up to degree 2, so its degree of precision is at least 2 
claim the degree of precision is actually 5. For f(x) = (x — x 0 ) 3 , 


fX 0 +h 1 

/ f{x)dx= -{x-XoY 

J Xq — h ^ 


XQ+h 


= 0 


and 


5/ hr o - M + 8 /h o) + 5/ x 0 + \/ -h 


xq — H 


5 I — \ —h ) +0 + 5 \ — h 


= o, 


so it is exact for degree three polynomials. For /(x) = (x — Xq) 4 , 


rx 0 +h ^ 

/ f(x)dx= -(x — x 0 ) f 
Jxo-h 13 


xo+h 


and 


5 f \ x o ~ \ ~h ] + 8/(x 0 ) + 5/ x 0 + \ -h 


Xq — h 


A 6 ' 


5 ( — \j -h I +0 + 5 


- 

- !*■• 


— h 4 + —h 4 
25 25 


so it is exact for degree four polynomials. For /(x) = (x — Xq) 5 , 


fX 0 +h 1 

/ f{x)dx= -{x-XqY 

JxQ-h D 


XQ+h 


= o 


and 


5/ x 0 - \/ -/i + 8/(x 0 ) + 5/ x 0 + \ -h 


Xq — h 


5 I — \ —h ) +0 + 5 \ — h 


= o, 


so it is exact for degree five polynomials. However, for /(x) = (x — xq) 6 , 


/■xo+h ^ 

/ f(x)dx= -(x-x 0 ) 7 

J Xq — h * 


XQ+h 


Xq — h 


-j fcT 


We 
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and 



so it is not exact for all degree six polynomials. Its degree of precision is 5. The formula is listed as the second 
Gaussian quadrature formula in table 4.5. 

We can also find the degree of precision of any numerical calculus formula by observing the form of its error 
term. If the error term has the form 0{h n /^ (£&))> then its degree of precision is k — 1. 

Some standard formulas 

Tables 4.2 , 4.3 , 4.4 , and 4.5 summarize some standard formulas for derivatives and integrals. Notice there are no 
one-point formulas for any derivatives, no two-point formulas for second derivatives or higher, and no three-point 
formulas for third derivatives or higher. The stencils have been streamlined to show only the values of (/. Hence, 
the stencil 


xq — h 


© 

Xq 


x 0 + h 
• > 


appears in the table as 


-1 


0 

© 


1 




Key Concepts 

Degree of precision: The integer m such that a numerical calculus formula is exact for all polynomials of degree 
up to and including m but is not exact for all polynomials of degree m + 1. 

Error terms: Error terms for numerical calculus approximations can be found by replacing all occurrences of / 
in an approximation formula by Taylor series expansions about Xq and reducing. 

Gaussian quadrature: A quadrature method which maximizes the degree of precision relative to the number of 
nodes used. 


Quadrature: Another name for a numerical integration formula. 

Weighted Mean Value Theorem: Assume that / and g are continuous on [a, 6]. If g never changes sign and is 
non-negative in [a, 6], then we have that, 


nb nb 

f(x)g(x)dx = /(c) / g(x)dx 


for some c in (a, b). 


Formula Name 
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~rf^(£h) Backward Difference 


Formula Name 
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Table 4.5: Some integration formulas. 
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Exercises 

1. Let f(x) = e x — sin a;. Complete the following table using the approximation formula 

-3f(xo) + 4/(*o + h) - f(x o + 2 h) 


f'(x o) 


2 h 


h 

approximate /'( 2) 

abs. error 

.01 

.005 

-.005 

-.01 




Is it OK to use negative values for hi 

2. For each value of x in the table, use the most accurate three-point formula to approximate f'(x). ^ 


X 

fix) 

f(x) 

-2.7 

0.054797 


-2.5 

0.11342 


-2.3 

0.65536 


-2.1 

0.98472 



3. Approximate the integral using Simpson’s rule. 


(a) 


f 

J - c 


x ln(x + 1 )dx 


- 0.5 
3 


(b) In (a; + 1) dx 


(C) [° 

J - c 


( d ) fi e ! 

( e ) fi x 


(cos x) 2 dx 


dx 


2 „4 


dx ^ 


4. Do question 3 using the Trapezoidal rule. : 

5. Do question 3 using the Midpoint rule. ! 

6. Find the error of the approximation in question 3. ' 

7. Find the error of the approximation in question 4. dT 

8. Find the error of the approximation in question 5. — LJ 

9. Find the error in approximating f^l( 32a: 2 + \flx — 2 )dx using Simpson’s | Rule. 

10. Find the error in approximating f 2 ^ ? (32x 5 + 7x 3 — 2 )dx using Bode’s Rule. ^ 

11. For the following values of /, xo, and h, use the formula 

f(xo + h)~ f(xo-h) 
f (xo) = 2ft —f 


6 


to approximate f'(x o). 

(a) f(x) = e x ; xo = 2; h = 0.1. ^ 

(b) f(x) = (cosh 2a:) 2 — sin a:; a;o = 7r; h = 0.05. ^ 

(c) f(x) = ln(2x — 3) + 5a;; x = 10; h = 1. 

12. Compute both a lower bound and an upper bound on the error for the approximation in question 11. Verify that the 

actual error is between these bounds. ' ' 

13. For each part of question 11, find the value of £ guaranteed by the formula. 

14. State the degree of precision of the closed Newton-Cotes formula on 5 nodes, Bode’s Rule. 

15. State the degree of precision of the five point formula. ® 

}'(x °) = ^ [— 25/(x 0 ) + 48/(a;o + h) - 36/(a;o + 2 h) 

+16/(®o + 3 h) - 3f(xo + 4 h)\ + y / (5) (0 
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16. Find the degree of precision of the quadrature formula 

r 5 


/(x)dx«| 3/(y)+/(5) 


17. Find the error term for the quadrature method, and state its degree of precision. 

rxo+h 


rx 0 -\-n 

(a) / / (x) dx « ft/(x 0 ) [A] 

J XQ 

rx 0 + h / /.x 

(b) / /(x) da; « hf (x 0 + -J 

r x o+ h h r / 2 \ 

J f[x)dx « - 3/ (x 0 + -ftj + /(x 0 ) 

3/ (x 0 + + /(so) 


(c) 

(d) [ 

J X 


(e) f 

J X 


XQ 

xo+2 h i 

f(x)dx « - 

CEO ^ 

rc 0 +3^i oi 

f(x)dx « — [/(x 0 ) + 3/(x 0 + 2ft)] [A] 


/ x o+^ h V / h\ / 

/(x)dx « - |/ (s 0 - ^ J + 3/ (s 0 + 

(g) 


£ 

(h) f 

J X 


XQ 

xq+2K 1 

f(x)dx « - [/(x 0 - ft) - 2/(xo) + 7/(xq + ft)] 


XQ 
xo+3^. 


/(x)dx ss 3ft 


CEO 

/ 10+3,1 /)l"/3\ /3N 

f(x)dx « - — 208/ (x 0 + -ftj - 891/(x 0 + ft) + 1344/ (^x 0 + -ftj 

18. Find the error term for the derivative approximation: 

(a) /'(x 0 ) « /(^ + 2fe )-/M W 


3/ (x 0 + |ft) - 6/(x 0 + ft) + 4/ (x 0 + jjft 


2 ft 


(b ) /' (xo) « f(x 0 + 2h)^-f{x 0 -h) 


(c) /'(xo) 


— 3/(x 0 ) + 4/(xo + |) - /(xo + ft) 


, — 13/(x 0 - 10ft) - 12/(xo + 5ft) + 25/(xo + 8ft) 

(d) f (X0) W 270ft 


(e) /'(x 0 ) ? 

( f ) /"(so) 

(g) /"(so) 

(h) /"(so) 
W /"(so) 


— 7/(xo + ft) + 416/(xo + | ft) — 2916/(xo + |ft) + 5632/(xo + |ft) - 


[A] 


12ft 

2/(xo - ft) - 3/(x 0 ) + /(xo + 2 ft) 

3ft 2 

7/(x 0 - 5ft) - 12/(x 0 ) + 5/(x 0 + 7 ft) 

210ft 2 

5/(xo — 5ft) — 12/(xo + 2ft) + 7/(xo + 7 ft) 

210ft 2 

5/(x 0 - 2ft) + 32/(x 0 - ft) - 60/(x 0 ) + 25/(x 0 + 2 ft) - 2/(x 0 + 4ft) 

60ft 2 


19. Diffy Renee writes down the following approximation: 

/"(3.0) « 25[sin(2.8) - 2sin(3.0) + sin(3.2)]. 


What is /(x)? [s] 
20. Let f(x) = sinx. 


(a) Find a bound on the error of the approximation 


/'( 6 ) 


—3 sin 6 + 4 sin 6.1 — sin 6.2 

0.2 


according to the appropriate error term. 


625/(x 0 + |ft)] [A) 


3125/(xp + f fa) [A] 
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(b) Compare this bound to the actual error. 

21. What can you say about the error in approximating the first derivative of 

fix) = -13z 4 + 17x 3 - 15a; 2 + 12a: - 99 

using a 5-point formula? 

22. Let f(x) = 3a; 3 — 2a; 2 + x. 

(a) Compute the error (not a bound on the error) in estimating /'( 2) using the forward difference 

f{x 0 + h) - f(x 0 ) 


with h <=0.1. 

(b) Find £o.i as guaranteed by the error term. 

23. Let f(x) = sin a;. Find a bound on the error of the approximation. 

(a) /"(3.0) « 25[sin(2.8) - 2 sin(3.0) + sin(3.2)] [AJ 

(b) /"(3.0) « 1600 [2 sin(3.0) - 5sin(3.025) + 4sin(3.05) - sin(3.075)] 

(c) /"'(3.0) « 500000 [—5 sin(3.0) + 18sin(3.01) - 24sin(3.02) + 14sin(3.03) - 3sin(3.04)] 

(d) /'"(3.0) « 1000 [— sin(2.8) + 3sin(2.9) - 3 sin(3.0) + sin(3.1)] 


( e ) [ f( x )dx « i [sin(3) + 4sin(3.5) + sin(4)] 

J 3 & 

(f) J 3 f{x)dx S3 £ 


I- J- ) 

2 2^3 J 


+sln| I + ^s) 


24. Suppose you have the following data on a function /. ^ 

a: 0 1 2 3 4 

f{x) -0.2381 -0.3125 -0.4545 -0.8333 -5 

(a) Approximate /'( 4) and /'( 2) using 5-point formulas. 

(b) Which approximation would you expect to be more accurate, and why? 

(c) Did it turn out that way? The data came from f(x) = x _ 4 0 . 

25. Refer to the quadrature method 


£ 


x 0 + h . 

f{x) dx = - 

XQ Z 


/ (*0 + |) + / (*0 + 


36 J 


in all of the following questions. ^ 

(a) What is the rate of convergence? 

(b) What is the degree of precision? 

(c) Use the method to approximate f^sinxdx. 

(d) Find a bound on the error of this approximation. 

(e) Compare the bound to the actual error. 

26. The Trapezoidal rule applied to / fix)dx gives the value 5, and the Midpoint rule gives the value 4. What value 

J o 

does Simpson’s rule give? 

27. The Trapezoidal Rule applied to f ( 2 fix) dx gives the value 4, and Simpson’s Rule gives the value 2. What is /(l)? ^ 

28. When approximating f"'ix o) using five nodes, the rate of convergence will be at least what? ^ 

29. Show that the average of the forward difference, ^ an( j backward difference, -/Co-C+ZOo) , approxima- 

tions of fix o) gives the central difference approximation, fUo+C W(.^o-C , 0 f f( Xo ). 

30. Chuck was “approximating” a definite integral using Simpson’s Rule. As you can see from his work below, he was 
integrating a cubic polynomial. Calculate the error he incurred even though you can not read all the coefficients. ^ 
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31. 

32. 

33. 

34. 

35. 


36. 


37. 

38. 


39. 


40. 



Repeat 30 supposing Chuck was using the Trapezoidal Rule. 1 

Sketch the graph of a function f(x), and indicate on it values for xo and h so that the backward difference 
gives a better approximation of f'(x o) than does the central difference lUo+M-ZOo-O _ 

Sketch the graph of a function f{x) for which the Trapezoidal Rule gives a better approximation of fg f(x) dx than 
does Simpson’s Rule, and explain how you know. ^ 

Suppose a 5 point formula is used to approximate f"(x o) for stepsizes h = 0.1 and h = 0.02. If Ao.i represents the 
error in the approximation for h = 0.1 and U0.02 represents the error in the approximation for h = 0.02, what would 
you expect to be, approximately? ^ 

A general three point formula using nodes xo, xo + ah, and *0 + 2h, (a ^ 0, 2) is given by 


/'(* 0) 


1 

2 h 


2 + a 
a 


fix 0 ) + 



a) 


/(* 0 + ah) 


-^—f{x 0 + 2h) 
2 — a 


(a) Show that this formula reduces to one of the standard formulas when a = 1. 

(b) Find the error term for this formula. 

Find three different approximations for /'( 0.2) using three-point formulas. ^ 


X 

fix) 

0 

1 

0.1 

1.10517 

0.2 

1.22140 

0.3 

1.34986 

0.4 

1.49182 


The graph of f"( x) is shown below. Use it to rank your three approximations in order from least expected error to 
greatest expected error, and explain why you ranked them the way you did. 



Verify numerically that the error in using the formula /'(* 0) = 2 /Oo U 3f(x 0 )+6f(x 0 +h) f(x 0 + 2 h) a pp rox j ma te 
f'(3) using the function /(*) = (cos3*) 2 + In* is really 0(h 3 ). 

Numerically approximate the best estimate that can be obtained from the formula 


, _ -2/(3 - h) - 3/(3) + 6/(3 + h)~ /( 3 + 2 h) 

; [ ’ 6h 

with double precision (standard Octave) computation and /(*) = (cos 3*) 2 + In*. What value of h gives this optimal 
approximation? ^ 

Find the degree of precision of the quadrature formula 



The quadrature formula / f(x)dx = 

Jo 

to 2. Determine Co, ci, and C 2 . 


c o/(0) + ci/(l) + 02/(2) is exact for all polynomials of degree less than or equal 


4.4. COMPOSITE INTEGRATION 


161 


4.4 Composite Integration 


In section 4.3 we supplied error terms that took the form 0(h k (£h))- As a prime example, the trapezoidal 

rx 0 +h l 

rule, / f{x)dx = — [f(x o) + /(x 0 + h)] + 0(h 3 f" (£ h )), has error term 0(/i 3 /"(^)). This conclusion follows 
Jx 0 2 

directly from a Taylor series analysis, but what does it mean? 


Error terms for derivative approximations are comparatively easy to understand. Consider the first derivative 

approximation /'( xo) = — li—1 ) + /( - o + — ) / ,,, (<Sy i ). The smaller h is, the smaller the error in approxi- 

2 h 6 


mating /'(xo) is (as long as the f" (fy,) term doesn’t counteract the benefit of shrinking h). Error terms for integral 
approximations are not as straightforward because, in each case, the quantity being approximated depends on h. 
Changing h in the integration formula also changes the quantity being approximated. This is true of each formula in 
table 4.5. The trapezoidal rule is as good an example as any. The left hand side, the quantity being approximated, 
is fz° +h f(x)dx, so smaller h means approximating the integral over a smaller interval. So how does having a 
smaller error in approximating a different number tell us anything about the potential benefit of computing with 
smaller values of h? Careful study of the trapezoidal rule will reveal the answer. 

According to the trapezoidal rule, | [f(x o) + f(x o + h )] approximates the integral of / over the interval [xo, Xo + 
h\. If h is replaced by h/ 2, the resulting approximation, | [f(x o) + f(x o + |)] , is an approximation of the integral 
of / over the interval [xo,Xq + !)]. It is no longer an approximation of the integral over [xo,Xo + h]\ To use 
the trapezoidal rule to approximate the original quantity, the integral of / over [xq,xq + h], using h/2 instead of 
h requires two applications of the trapezoidal rule — one over the interval [xo,Xo + ((] and one over the interval 
[xo + |,Xo + h]. The sum of these two approximations is an approximation for the integral of / over [xo,Xo + h]. 
Reducing h further requires more applications of the trapezoidal rule over more intervals. In general, reducing h to 
for any whole number n requires n applications of the trapezoidal rule: 


xa+h 


f{x)dx = 


f{x)dx 


r* o+2£ 


J Xo 

h 

2 n 


'xo+l 


f(x o) + f [X o + - 
1 n 

h 

2 n 


f{x)dx - 
h 

2 n 


fXo + h 


' xo + (n—l)t 


f{x)dx 


f [ x o H — + / (x 0 + 2 — 
n J \ n 


h 


f ( x 0 + (n - 1) - ) + f(x o + h) 


(4.4.1) 


Decomposing f*° +h f{x)dx into the sum J ^ f(x)dx + J ^ f{x)dx+- ■ • + f*™ f{x)dx and summing approximations 
of these integrals is called composite integration. 

As for using the trapezoidal rule to do the approximating, the error in a single application of the trapezoidal rule is 
0(h 3 f"(£ h )). The error in the above sum is, therefore, bounded by l M {nY fifr) = Mh (nY ' n S”=i 
for some ^ with Xo + (i — 1)^ < fit < x o + Assuming f" is continuous on [xo,Xo + h], the intermediate value 
theorem allows us to replace ^ l f'iVi) f”(£n) for some £ (xo,xo + h) because ^ J27=i f"(di) is th e 
average of the f" (Hi), which is no more than the maximum of the and no less than the minimum of the /"(/q). 

Making this replacement gives us the error bound Mh (^) 2 f"(£n)- I n conclusion, the trapezoidal rule used multiple 
times when necessary to approximate J^° +h f{x)dx actually has error O ^(^) 2 / 7, (^n)^ , where n is the number of 
subintervals used in the calculation and depends on n. Now the nature of the error is clearer. It is measured 
by how many subintervals are used in the calculation. More subintervals (greater n) means less error (assuming 
the benefit of more subintervals is not counteracted by the f" factor). Other composite integration formulas are 
similar. If a single-interval quadrature formula has error 0(h k /^(£fe)), then the corresponding composite version 

has error O ((^) k 1 /^(£r s) ) ■ More intervals generally means smaller error. 


Composite Trapezoidal Rule 

Equation 4.4.1 encapsulates the composite trapezoidal rule but does not represent the most efficient way to use it. 
Simplifying the expression will help. Notice that all of the function evaluations except /(xq) and /(xq + h ) occur 
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Table 4.6: Minimum number of intervals to achieve certain accuracies using the composite trapezoidal rule to 
approximate f Q e~ x dx. 


accuracy 

2.2(10)“ 2 

5(10)- 5 

icr 5 nr 7 nr 11 icr 15 

subintervals 

2 

3 

8 75 7453 > 745300 


twice, so we can condense the formula to 


r xo+ h u u 

f(x)dx « — [f(x 0 ) + f(x o + h)] + - 


f ( x o H — I +••• + /( Sq + (n — 1) 


h 

2 n 


71-1 / h \ 

/( x 0 ) + f{x 0 + h) + 2 ^ / f s 0 + i- j 
1 = 1 ' n ' 


This leads to the following pseudo-code where we make the substitutions a = Sq and b = Xq + h. 


Assumptions: / has a continuous second derivative on [a, 6]. 

Input: Function /; interval over which to integrate [a, 6]; number of subintervals n. 

Step 1: Set s = I = /(a) + /(b) ; 

Step 2: For i = 1, 2, . . . , n — 1 do Step 3: 

Step 3: Set 1 = 1 + /(a + is)' 

Step 4: Set I = si; 

Output: Approximate value of / a & f{x)dx. 

Other composite integration formulas should be simplified likewise to minimize the number of times / is evaluated. 


Adaptive quadrature 

e~ x2 dx « 4.57837939409486 

and it is simple enough to approximate this value with the composite trapezoidal rule. Table 4.6 shows the 
minimum number of subintervals needed to achieve various accuracies, assuming the calculations are done with 
enough significant digits that floating point error does not overwhelm the calculation. It should be apparent that 
achieving high accuracy results using the 



Crumpet 28: error function 


The error function is defined as 


erf(i) = — = 
V?r 


nx 

Jo 


dt 


and is critical in the study of statistics as it is used to calculate probabilities associated with the normal distri- 
bution. The factor comes from the fact that e~ t dt = , an interesting fact itself. 

Computer algebra systems will have the error function built-in just as they do the sine or logarithm functions. 
Hence, the easiest way to evaluate e~ x dx is to have a computer algebra system (or perhaps your calculator) 
compute Bperf(3) . 


trapezoidal rule is not practical. It requires too many computations. We will take up this deficiency in the next 
section. For now, let’s analyze the usefulness of the error bound O ( (^) 2 f"{£n ) ) ■ Assuming /"(£„) is roughly 
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constant, we should expect to improve our estimate from an accuracy of 2.2(10) 2 to an accuracy of 5(10) 5 , 
an increase in accuracy of ~ 5 ()q )-5 ~ 440 times, by increasing the number of subintervals by a factor of about 

V440 « 21. In other words, we should expect it to take approximately 42 subintervals to achieve 5(10) -5 accuracy 
based on accuracy of 2.2(10)~ 2 with 2 intervals. Since it only takes 3, we conclude that the assumption that 
/"(£ 2 ) ~ f"{£, 3) is bad! Luckily, the badness of this assumption actually works in our favor. It takes less, not more, 
than the expected number of intervals to achieve 5(10)~ 5 accuracy. On the other hand, increasing the accuracy from 
5(10) -5 to 10 — 5 , an increase by a factor of 5, we should expect to need about \/E ss 2.2 times as many subintervals. 
3 x 2.2 = 6.6, so the 8 needed is just about what we would expect. Similarly, to increase the accuracy from 10 -5 
to KT 7 , an increase in accuracy by a factor of 100, we should expect to need about 10 times as many subintervals. 
Indeed, 75 is about 10 times as many as 8. Likewise, to increase accuracy by a factor of 10,000 (as in going from 
10 - ' to 10 -11 or from 10 -11 to 10 -15 ), we should expect to need to increase the number of subintervals by a factor 
of 100. Indeed, the table bears this estimate out as well. 

Just remember, if f" does not exist or is wildly discontinuous, or just wildly varying, the assumption that 
f " (£„) is constant could be a bad one, no matter how many subintervals are used. The more common case is when 
f" is continuous and reasonably tame, though. Even in this case, when the number of subintervals is small, the 
assumption is often not a good one, but when the number of subintervals is large, it is a pretty reliable assumption. 
The exact number of subintervals needed before this assumption is reasonable changes from one function to another, 
however. 

Taking this lesson to heart, we approximate 

x — e x cos \J e 2x — x 2 ^j dx 

using the trapezoidal rule with 50 subintervals and find that it is accurate to within about ICC 1 of the exact value. 
How many subintervals should we expect to need to achieve ICC 3 accuracy? About 10 times as many, or about 
500. With 500 subintervals, we actually attain accuracy of about .997(10) — 3 , spot on! The assumption that /"(£„) 
is constant seems to be valid for this integral with n > 50 (and maybe for some n < 50 too). Alas, this is the type 
of analysis that can not be done in practice. In practice, we calculate integrals numerically because we don’t know 
how to compute their values exactly! In “real life” situations, we have no way of knowing how accurate an integral 
estimate is with 3 or 50 or 500 or 3000 subintervals. We need the computer to estimate errors as it calculates, just 
as we had it do for root-finding algorithms. 

Even though we know the assumption is not perfect, especially for small n, we assume f" (in) is constant, so the 
error of the trapezoidal rule becomes O . The f" factor is subsumed by the implied constant of the big-oh 

notation. Accordingly, halving the number of intervals can be expected to increase the error by a factor of about 4. 
Introducing the notation Tk(a, b) for the composite trapezoidal rule approximation of f ^ f(x)dx with k subintervals 
and ek = J ^ f(x)dx — Tk(a , b ) for its error, 



so 





and 


&2n ~ 


M 



2 


&2n 



which implies e n 


4e 2n . 


Because f b f(x)dx = T 2 (a, b) + e 2 = Tj(a, b) + e\, 


T 2 (a,b) -Ti(a,b) = ei - e 2 
« 4e 2 — e 2 
= 3e 2 


so e 2 « 4(T 2 (a, b) — Tj(a,6)). Explicitly, 


J f(x)dx - T 2 (a,b) « ^(T 2 (a,6) -Ti(o,6)). 


We now have a way of approximating the error numerically, a significant breakthrough! The error is approximately 
one third the difference between the trapezoidal rule approximations with one subinterval and with two. 
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To harness this knowledge, we need to incorporate this estimate into our calculation. Suppose we wish to estimate 
fa f{x)dx to within an accuracy of tol. We begin by calculating T 2 (a, b ) and Ti(a, b). If f | T 2 (a, b) — Ti(a, 6)| < tol, 
we are done. T 2 (a,b) is our approximation. In the more likely case that ^ |T 2 (a, b) — T\(a, b)\ > tol , we divide 
the interval [a, 6] into two subintervals, [a, and [^^,6] and compare our error estimates on these subintervals 
to If | |T 2 (a, s 4^-) — Ti(a, g ^)\ < ^, we are done with the subinterval [a, T 2 (a, 9 ^) is a satisfactory 

a + b 

approximation of f a 2 f(x)dx. If not, we bisect the interval again and compare error estimates to . On the other 
half of [o,6], if ^ |T 2 (^,&) — T’i( s dl>,6)| < we are done with the subinterval [^, 6]. T 2 (^,6) is a satisfactory 
approximation of [ a+b f(x)dx. If not, we bisect the interval again and compare error estimates to Each time 
a subinterval fails to meet the error tolerance, we divide it in half and try again. The process will normally end 
successfully because, with each subinterval division, we will generally have the error decreasing by a factor of 4 
while the error requirement is decreasing by a factor of only 2. In the end, the sum of the T 2 estimates where the 
error tolerance is met will be our approximation for f a f(x)dx. 

The simplest way to code this algorithm is to use a recursive function. It is possible to do without, but the record 
keeping is burdensome. Depending on the programming language you are using, the trade-off may be simplicity for 
speed. Some languages do not handle recursive functions quickly. 

Assumptions: / has a continuous second derivative on [a, b\. 

Input: Function /; interval over which to integrate [a, 6]; tolerance tol. 

Step 1: Set m = h f a ; I\ = Ti(a, b); I 2 = T 2 (a , b ); 

Step 2: If |/ 2 — 7-| | < 3 tol then return I 2 ; 

Step 3: Do Steps 1-5 with inputs /; [a, a + 6 ]; and tr f : and set A equal to the result; 

Step 4: Do Steps 1-5 with inputs /; [^^,6]; and ^ : and set B equal to the result; 

Step 5: Return A + B; 

Output: Approximate value of f{x)dx. 

A tabulated example of such a computation might help clarify any confusion over how this algorithm works. The 
following table approximates the integral J 0 ? ln(3 + x)dx with a tolerance of .006. 


a 

b 

T\(a, b) 

T 2 (a, b) 

i|T 2 (a,6)-Ti(a,6)| 

tol 


0 

3 

4.33555 

4.42389 

.02944 

.00600 

X 

0 

1.5 

1.95201 

1.96732 

.00510 

.00300 

X 

0 

0.75 

0.90763 

0.90997 

.00077 

.00150 

V 

0.75 

1.5 

1.05968 

1.06124 

.00051 

.00150 

✓ 

1.5 

3 

2.47187 

2.47961 

.00257 

.00300 

✓ 


f 0 3 ln(3 + x)dx « 0.90997 + 1.06124 + 2.47961 = 4.45082 


The calculation in the table requires 7 evaluations of / and underestimates the integral by about .00390. In order 
of occurrence, the evaluations happen at x = 0, 3, 1.5, .75, .375, 1.125, 2.25. The composite trapezoidal rule with 
7 evaluations (6 subintervals each of length .5) underestimates the integral by about .00346. The non-adaptive 
composite trapezoidal rule gives a slightly better estimate with essentially the same amount of computation. But 
remember, it is not necessarily efficiency we are after. It is automatic error estimates. The adaptive trapezoidal 
rule does something the conventional composite trapezoidal rule does not. It monitors itself for accuracy, so when 
the routine completes, you not only get an estimate, but you can have some confidence in its accuracy even when 
you have no way to calculate the integral exactly for comparison. 

Key Concepts 

Composite numerical integration: Dividing the interval of integration into a number of subintervals, applying 
a simple quadrature formula to each subinterval and summing the results. 

Adaptive numerical integration: Leveraging the error term of a simple quadrature formula in order to obtain 
automatic calculation of the number and nature of subintervals needed to obtain a definite integral with some 
prescribed accuracy. 
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Exercises 

1. Use the composite midpoint rule with 3 subintervals to 
approximate 


(a) 

r 3 

/ ln(sin(*))da: ^ 


f 7 

(b) 

/ yjx COS X dx 


J 5 

(c) 

f 4 e x hi(:r) dx [A] 

J 1 x 

(d) 

/ yj 1 + cos 2 x dx 

J 10 



(e) 

/ * dx^ 

J In 3 1 + X 

(f) 

Jo x + 1 


2. Redo question 1 using the composite trapezoidal rule. 

[S] [A] 

3. Redo question 1 using the composite Simpson’s rule. 

[S] [A] 

4. Redo question 1 using the composite Simpson’s | rule. 


1 [A] 


5. Redo question 1 using the composite version of the 
quadrature rule - ' 


l 


x 0 +3A nr 

f(x)dx = [f{x o + h) + f(x o + 2 h)] . 

XQ ^ 


6. Use a composite version of the quadrature rule 



12. Based on our discussion of composite integration, the 
error term for composite Simpson’s rule applied to 

f^f(x)dx with n subintervals is O ^(^) 4 /^(£n)^ • 
With a bit more work, it can be shown that the error 
term is actually — ^)p/i 4 / (4 ) (£ n ) where h = ^f L - No 
big-oh needed. This error is exact for some G [a, b \ . 
Use this error term to find a theoretical bound on the 
error in estimating 



using (composite) Simpson’s rule with h = 0.1. 

13. Why does the composite trapezoidal rule ALWAYS (for 
any h) give an underestimate of 

sin x dx? 



14. Demonstrate geometrically and with some words the 

approximation of fj d x using the composite 

trapezoidal rule with 4 trapezoids (that is, 4 subinter- 
vals) . 

15. Approximate ln(sin(:r))d:r using adaptive Simpson’s 
method with tolerance 0.002. ^ 

16. Use adaptive Simpson’s method to approximate 

/ ln(* + 1 )dx accurate to within 10~ 4 . ^ 

Jo 

17. Derive a quadrature formula for 



dx 


with three subintervals to approximate 

3 x 3 

— 5 -dx. 

x 3 + 1 

7. Use the (simple) trapezoidal rule on fj sin 4 x dx to help 
estimate the number of intervals [0, 7r] must be divided 
into in order to approximate J * sin 4 x dx to within 
10~ 4 using the composite trapezoidal rule. NOTE: 
fg sin 4 x dx = |7r. ^ 

8. Repeat question 7 using the midpoint rule. 7 

9. Repeat question 7 using Simpson’s rule. 

10. Suppose composite Simpson’s rule with 100 subinter- 
vals was used to estimate f 32 f(x) dx, and the absolute 
error turned out to be less than 10~ 5 . What function 
might f(x) have been? 

11. Derive a summation formula for the composite version 
of 

(a) the midpoint rule. 

(b) Simpson’s rule. ^ 

(c) Simpson’s | rule. ^ 

(d) the quadrature formula 


using unspecified nodes a < xo < x\ < b. In other 
words, derive a “general trapezoidal rule” where xo and 
x\ are allowed to be any two distinct values in [a, 6]. 

18. In your formula from question 17, make the substitu- 
tions xo = a, x i = b, and x\ — xo = h, and show that 
it thus reduces to the trapezoidal rule. 

19. Let / = Jq x 2 ln(a; 2 + 1) dx. ^ 

(a) Approximate / using the Midpoint rule. 

(b) Use your answer to (a) to estimate the number of 
subintervals needed to approximate / to within 
IQ - 4 NOTE' I — 24 ln (5) — 6 tan 1 (2)— 4 

20. Let I = f~ x 2 ln(a: 2 + 1) dx. 

(a) Approximate / using Simpson’s rule. 

(b) Use your answer to (a) to estimate the number of 
subintervals needed to approximate I to within 
IQ -4 NOTE - / = 24 ln(5) — 6 tan 1 (2)— 4 

21. " »Use Octave to calculate the estimate suggested in 
question 19b. Is the absolute error less than 10 -4 ? ^ 

22. " » Use Octave to calculate the estimate suggested in 
question 20b. Is the absolute error less than 10~ 4 ? 
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23. 

24. 

25. 

26. 

27. 


Use the composite trapezoidal rule to estimate 

How many 


c 

/ ln(* + 1 )dx accurate to within 1(U 

Jo 

subintervals are needed? ^ 

* • Repeat question 23 using the composite midpoint 
rule. 

• Use composite Simpson’s rule to estimate 
In (a; + l)dx accurate to within 10~ c . How many 
subintervals are needed? 

" • Repeat question 25 using composite Simpson’s | 
rule. [A) 


L 


* • Write an Octave function that implements adap- 
tive Simpson’s rule as a recursive function. Some notes 
about the structure: ^ 


(a) The inputs to the function should be f(x ), a, b, 
and a maximum overall error, tol. 

(b) The output of the function should be the esti- 
mate and, if you are feeling particularly stirred, 
the number of function evaluations. 


28. 

29. 


* ■ Use your code from question 27 to approximate 
/j 3 ln(sin(a:))da: with tolerance 0.002. ^ 

" • Use your code from question 27 to approximate 


/ 

Jo 


In (a; + 1 )dx accurate to within 10 


30. (i) Use your code from question 27 to approximate 
the integral using tol = 10 -5 . (ii) Calculate the actual 
error of the approximation, (iii) Is the approximation 
accurate to within 1CU 5 as requested? 

p2tt 

(a) / *sin(x 2 )da: ^ 

Jo 

r 2 1 

(b) / — dx 

J o.i x 

(c) f x 2 ln(a: 2 + 1) dx 

Jo 

NOTE: f 2 x 2 ln(a: 2 + 1 ) dx = 

31. * ■ Write an Octave function that implements the gen- 
eral trapezoidal rule of question 1 in such a way that 
xo and Xi are chosen at random. 

32. " • Write an Octave function that implements a com- 
posite version of the quadrature method in question 
31. 

33. * o Do some numerical experiments to compare the 
(standard) composite trapezoidal rule to the (random) 
composite trapezoidal rule of question 32. What do 
you find? 
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4.5 Extrapolation 

In calculus, you undoubtedly encountered Euler’s constant, e, which you were probably told is approximately 2.718, 
or maybe just 2.7. And unless you were involved in a digits-of-e memorization contest, you probably never saw 
more digits of e than your calculator could show. We’re about to change that. The first 50 digits of e are 

2.7182818284590452353602874713526624977572470936999. 

How many of them do you remember? Not to worry if it is not very many. No quiz on the digits of e is imminent. 


Crumpet 29: Digits of e 


The first 1000 digits of e, 50 per line, are 

2.7182818284590452353602874713526624977572470936999 

59574966967627724076630353547594571382178525166427 

42746639193200305992181741359662904357290033429526 

05956307381323286279434907632338298807531952510190 

11573834187930702154089149934884167509244761460668 

08226480016847741185374234544243710753907774499206 

95517027618386062613313845830007520449338265602976 

06737113200709328709127443747047230696977209310141 

69283681902551510865746377211125238978442505695369 

67707854499699679468644549059879316368892300987931 

27736178215424999229576351482208269895193668033182 

52886939849646510582093923982948879332036250944311 

73012381970684161403970198376793206832823764648042 

95311802328782509819455815301756717361332069811250 

99618188159304169035159888851934580727386673858942 

28792284998920868058257492796104841984443634632449 

68487560233624827041978623209002160990235304369941 

84914631409343173814364054625315209618369088870701 

67683964243781405927145635490613031072085103837505 

10115747704171898610687396965521267154688957035035 


However, do you recall from calculus that 

lim(l + h) 1/h = e? 

h-tO 

Can you prove it? Proof on page 174. Based on this fact, we might use 

e(h) = (1 + h) 1 ' h 

to approximate e. No time like the present! 


e(0.01) 
e(0.005) 
e(0.0025) 
e(0. 00125) 
e(0. 000625) 


2.704813829421529 

2.711517122929293 

2.714891744381238 

2.716584846682473 

2.717432851769196. 
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Sadly, this sequence of approximations is not converging very quickly. We have two digits of accuracy in the first 
approximation and still only three digits of accuracy in the fifth. We could, of course, continue to make h smaller to 
get more accurate approximations, but based on the slow improvement observed so far, this does not seem like a very 
promising route. Instead, we can combine the estimates we already have to get an improved approximation. This 
idea should remind you, at least on the surface, of Aitken’s delta-squared method. In that method, we combined 
three consecutive approximations to form another that was generally a better approximation than any of the original 
three. We will do something similar here, combining inadequate approximations to find better ones. We will name 
the various new approximations for continued reuse. 


2e(0.005) - e(0.01) 
2e(0.0025) - e(0.005) 
2e(0. 00125) - e(0.0025) 
2e(0. 000625) - e(0.00125) 


ei(0.01) = 2.718220416437056 
ei(0.005) = 2.718266365833184 
ei (0.0025) = 2.718277948983707 
ei(0. 00125) = 2.718280856855920 


(4.5.1) 


Each of these new approximations is accurate to 5 or 6 significant digits! Already a significant improvement. We 
can combine them further to find yet better approximations: 


4ei (0.005) -ei(0.01) 
3 

4ei (0.0025) - ei (0.005) 
3 

4ei(0. 00125) — e(0.0025) 
3 


e 2 (0.01) = 2.718281682298560 
e 2 (0.005) = 2.718281810033881 
e 2 (0.0025) = 2.718281826146657. 


(4.5.2) 


The first of these approximations is accurate to seven significant digits, the second to eight, and the third to nine! 
And we can combine them further: 


8e 2 (0.005) - e 2 (0.01) 
7 

8e 2 (0.0025) - e 2 (0.005) 
7 


= e 3 (0.01) = 2.718281828281785 
= e 3 (0.005) = 2.718281828448482. 


(4.5.3) 


Now we have approximations accurate to ten and eleven significant digits! Looking back, we took five approximations 
that had no better than 3 significant digits of accuracy and combined them to get two approximations that were 
accurate to at least 10 significant digits each. Magic! Okay, not magic, mathemagic! Here is how it works. 
Suppose we are approximating p using the formula p(h), and we know that 


p(h) =p + c 1 -h mi + c 2 -h m2 +c 3 -h m3 + ■■■ . 


Then 


p (ah) = p + Ci • ( ah) mi + c 2 • (ah)™ 2 + c 3 • (ah)" 13 + • • • . 


Now, if we multiply the second equation by a mi and subtract the first from it, the k mi terms vanish, and we get 
an approximation with error term beginning with c 2 • h" 12 : 


a~ mi p (ah) = a~ mi p + c t • h mi + c 2 a m2 ~ mi • h™ 2 + c 3 a m3 ~ mi ■ h™ 3 + ■■■ 

— \p(h) = p + Ci • h mi + c 2 • h" 12 + c 3 • h" 13 + • • • ] 

a~ mi p (ah) — p(h) = (a~ m - 1 )p + c 2 (a m2 ~ mi - 1) • h" 12 + c 3 (a m3 - mi - 1) • h ™ 3 + ■■■ 


With a little rearranging, 


a mi p(ah)—p(h) 
a~ mi - 1 


p + d 2 -h m2 +d 3 -h m3 + ••• 


(4.5.4) 


for some constants d 2 ,d 3 , .... If m 2 > m i, then this method will tend to improve on the two approximations p(h ) 
and p (ah) by combining them into a single approximation with error commensurate with some constant multiple 
of h" 12 . This calculation is the basis for Richardson’s extrapolation. 

It just so happens e(h) has exactly the form needed. 


e(h) = e + ci h + c 2 h 2 + c 3 h 3 + c 3 h 4 + 0(h 5 ) 


(4.5.5) 
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for some constants 01,02,03,04. The actual values of the constants are not relevant for this computation. To 
understand the computation of e 3 , we use equation 4.5.4 with a = ) and mi = 1 to get 

*(*) = 

= 2e + Ci h + — c 2 h 2 H — ~c 3 h 3 + — C4/1 4 + 0 (h 3 ) 

2 4 8 

— [e + c\h + c 2 h 2 + C3/1 3 + C4/1 4 + 0(/i 5 )] 

= e + d 2 h 2 + d 3 h 3 + d A h 4 + 0(h 5 ) 


for some constants d 2 ,d 3 ,d 4 . e\{h) is the formula that gave us the round of approximations accurate to 5 or 6 
significant digits. It is not hard to find the constants di in terms of the constants Ci, but, again, the values of the 
constants are immaterial and can only serve to complicate further refinements. What is important is the form of 
the error. Now that we know ei (h) = e + d 2 h 2 + d 3 h 3 + drft 4 + 0(ft 5 ), we find e 2 (h ) using formula 4.5.4 with a = ^ 
and ?ni = 2: 

e 2 m = 

o 

= e + k 3 h 3 + k A h 4 + 0(h 5 ) 


for some constants k 3 and k 4 . e 2 (h) is the formula that gave us the round of approximations accurate to 7 to 9 
significant digits. We can again use formula 4.5.4, this time with a = ^ and toi = 3: 


e 3 {h) 


8e 2 (|) - e 2 {h) 

7 

e + hh 4 + 0 {h 5 ) 


for some constant I 4 . e 3 (h) is the formula that gave us the approximations accurate to 10 and 11 significant digits. 
Now is a good time to see if you can use the expression for e 3 (h) and formula 4.5.4 to derive an 0(h 5 ) formula for 
64 (h). Then use your formula to compute e4(0.01) using the previously given values of 03(0. 01) and 03(0. 005). How 
accurate is e4(0.01)? Answers on page 174. 

As a special case, Richardson’s extrapolation with a = | applied to any approximation of the form 

po(h) = p + c\h + c 2 h 2 + c 3 h 3 H 


gives the recursively defined refinements 


Pk(h) 


2 fe Pfc-i (|) -pk-i(h) 
2 k — 1 


k= 1,2,3,... 


which are expected to increase in accuracy as k increases. For other a or other forms of error, the formula for pk(h) 
changes according to 4.5.4. 


Crumpet 30: A Taylor polynomial for e(h) 


e is undefined at 0, so its derivatives at 0 are as well. However, if we extend the definition of e to 


e(/i) 


f(l + h) 1 / h ifh^O 
\e if ft is 0 ’ 


thus defining e at 0, then e(ft) becomes infinitely differentiable at 0, and its fifth Taylor polynomial, for example, 
is: 


for some £ € (0, ft). 


~iu\ e u , lle u 2 

e ( h ) =e ~2' h+ H' h 




2447e 

5760 


•ft 4 + 


/ ( 5 ) ( 0 , 5 

120 
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Differentiation 


Using extrapolation, high order differentiation approximation formulas can be derived from low order formulas. 

We begin with the lowest order approximation, f'(x o) = — ^ ° The standard error term, 

— 2 f"(£h) does not give the error in the form c • h mi + 0(h m2 ) as required by Richardson’s extrapolation, so we 
return to Taylor series to determine the 0(h m 2 ) term: 


f{x 0 + h) = f(x 0 ) + hf\x 0 ) + ^h 2 f"(x 0 ) + ^h 3 f"(x 0 ) H 

SO 

—f( x o) T /(A o T 1 , (H I \ . i , 2 fin i \ , 

^ = / Wo) + 2 h f Wo) + Q h f wo) H ■ 

Hence, 

f'l ^ ~fi x o) T fi x o + h) 1 , 1 , 2 <■////• \ 

/ Wo) /i = -^hf{x 0 )~-hf (x 0 ) 

= C\h + 0(h 2 ) 


and extrapolation will yield an 0(h 2 ) formula, 
tells us the approximation 


Letting p(h ) = f( x o)+Kxa+h) , a _ 2 t and mi = 1, formula 4.5.4 


\p{2 h) - p(h ) 
5- 1 


will be an 0(h 2 ) formula for f'(x o). Simplifying, 


\p{2h)-p(h) 2 

-/Uo)+/Uo+27i) 

2 h 

-f{xo)+f{x 0 +h) 

h 

1 l 


1 

2 x 


2 

-/(*o)+/(*o+2h) 

-if{x 0 )+4:f(xo+h) 


4 h 

Ah 


1 



2 


’if{x 0 )-if(x 0 +h)+f{x 0 +2h) 


Ah 



1 



2 

— 3/(x 0 ) + 4/ (xp + fe) - /(a:o + 2h) 

2h 

Hence, we have f(x q) = ~ 3 H a; o)+4/(a:o+/i)-y(ao+2/i) _j_ b u t this is not news. This is the first 3-point formula 

in table 4.2! Other high order derivative formulas can be derived by extrapolation too, but, generally, nothing new 
is learned from the result. We simply have a new way of deriving high order differentiation formulas. 


Integration 

Applying extrapolation to definite integrals is more rewarding. We begin with any composite integration formula 
and apply Richardson’s extrapolation. We now consider the composite trapezoidal rule and use the notation Tk(a, b ) 
to represent the approximation of j ^ f{x)dx using the trapezoidal rule with k subintervals. 

Before continuing we need to have a good idea what it means for the composite trapezoidal rule to have error 
term O ((^) 2 ) ■ In essence, it means we should expect the error to decrease by a factor of about 4 when the number 
of intervals is doubled. We should expect the error to decrease by a factor of about 9 when the number of intervals is 
tripled. And generally we should expect the error to decrease by a factor of about /3 2 when the number of intervals 
is multiplied by /?. To see this effect in action, consider the definite integral 
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whose exact value is 1 — cos(l) ~ .4596976941318602. The absolute errors of T 5 (0, 1), T 10 (0, 1), and Ti 5 (0, 1) are 


f sin x dx — 75(0, 1) 

Jo 

/ sinxd.T — Tio(0, 1) 
Jo 

f sinx dx — Ti 5 (0,1) 

Jo 


1.533(10) -3 

3.831(10) -4 

1.702(10) -4 


We should expect the error 


f 0 sin a; dx — T§( 0, 1) 


to be about four times the error 


Jg 1 sin x dx — T10 (0,1) 


and nine 


times the error 


fg sin xdx — Ti 5 (0, 1) 


. To check, we compute the ratios: 


f 0 sin x dx — T5(0, 1) 


f 0 sin xdx — 2\o(0, 1) 
fg sin xdx — 25(0, 1) 


f n sin xdx — Ti 5 (0, 1) 


1.533(10)- 3 

3.831(10)- 4 

1.533(10)- 3 
1.702(10) 


-4 


4.001 


9.007. 


What should you expect the ratio 


/g 1 sin x dx— Tio(0,l' 


/g 1 sin x dx— Ti5(0,l' 


to be about? Answer on page 174. 


Finally, we apply Richardson’s extrapolation with a = | and m\ = 2 to produce the higher order estimate, 

rr , M _ 4T 2fc (a, b) - T k (a , b) 

(a,b) = • 

We defer to numerics to get a handle on the error term of the refinement T k \. We begin by collecting some data. 
Continuing with the analysis of J 0 ' sin x dx, note that 

T 5 (0,1) « .4581643459604436 
Ti O (0,l) w .4593145488579763 
T 2O (0,l) S3 .4596019197882473 
T 4 o (0,1) S3 .4596737512942187. 


Hence, 


l5,i(0,l) 

2io,i(0,l) 

22o.i(0,1) 


and 


42-10(0,1) -T 5 (0,1) r 

3 

422 0 (0, 1) — 2iq(0, 1) 
3 

4T 4O (0,l)-T 20 (0,l) 


f 0 sin xdx — T 5 i i( 0, 1) 
/p 1 sin xdx — ^10,1(0,1 ) 
/q 1 sin xdx — ^0,1(0, 1 ) 
Jp sin x dx -I20, i(0,l) 


.4596979498238206 
3 .4596977100983375 
s .4596976951295424 

« 16.01 

« 16.00. 


When we double the number of subintervals, the error is decreased by a factor of 16. That’s 2 4 , not 2 3 as we might 
have expected! The first refinement takes us from a O approximation to a O approximation. In 

other words, the error of T n>1 is O ((^) 4 ) . 
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Table 4.7: 

Romberg’s method 

T\ 

T hi 

Tl,2 Ti , 3 

t 2 

T 2 , 1 

? 2,2 ■ 

T a 

74,! 


Ts 




Now that we know the error of T n< i is O 
with and mi = 4, we have 


we can extrapolate again. Applying Richardson’s extrapolation 


T 52 (0,1) = 16 ri 0 ' 1 (°’ 1 ) — ^(0,1) ~ .4596976941166387 

15 

Tio 2 (0,1) = 16r2O ' l(0 ’ 1) ~ ri °d(°> 1 ) ~ .4596976941316228. 

’ 15 

We now have approximations T 5 2 and T 10 ,2 whose errors are only about 1.522(10) -11 and 2.374(10) -13 , respectively. 
Use this information to calculate 75 3 and its absolute error. Answers on page 174. 

The method of combining Richardson’s extrapolation with the trapezoidal rule is known as Romberg’s method 
or Romberg integration. The calculation is often tabulated for organizational purposes as in Table 4.7. Rows are 
added until the differences | 7fc, ra — Tfc,n+i| and \T 2 k,n — Tk,n +1 are both less than some tolerance. 

Though Richardson’s extrapolation may be applied to any composite integration formula, the computations 
of the error terms above help explain why the trapezoidal rule is the right one to use. We might infer from our 
calculations (and it can be proven true) that the error term of the composite trapezoidal rule contains only even 
powers of A To be explicit, we have 

J f(x)dx = T n (a,b) + c 2 (^j + c 4 ^^ + C6 (n) 

so each refinement increases the least degree in the error term by 2, not 1. Skipping the odd degrees makes this 
particular choice very efficient. But this method comes with a price. Hidden within c 2 is the assumption that / has 
a continuous second derivative. Hidden within C 4 is the assumption that / has a continuous fourth derivative. And 
so on. The accuracy of each refinement depends on / having two more continuous derivatives. The more refinements 
we do, the smoother / must be for this method to work. For this reason, it is advisable to use Romberg’s method 
only when the integrand is known to have sufficient derivatives. 

Key Concepts 

Richardson’s extrapolation: If approximation p is know to have the form 

p(h) =p + Cl h mi +0(h m2 ) 


then the approximation 


will have error 0(h m2 ). 


a mi p (ah) — p(h) 
a-™' - 1 


Romberg integration: The application of Richardson’s extrapolation to the trapezoidal method. 


Exercises 

1. One can use Taylor Polynomials to show that 


Therefore, N(h) = 4 sin(/i7r) is an 0(h 2 ) approxima- 
tion of 7t. Use Richardson’s extrapolation to derive an 
0{h A ) approximation of 7r. ^ 


7r = — sin(/i7r) + K 2 h 2 + Kih 4 + K&h s + • ■ • . 


2. It is interesting to note that we can reverse engi- 
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neer Richardson refinements in order to approximate 
the d of equation 4.5.5 on page 168. For example, 
e{h ) = e + cih + 0(h 2 ), and we assume the 0{h 2 ) term 
is relatively small, so we can rearrange this equation to 
find 

e(h) — e 


h 


Cl. 


To take a specific 

2.711517122929293 — e 


example, 
— 1.35 so Ci i 


e(.005) — e 
AK)5 

— 1.35. If we 


pay careful attention to how the constants are affected 
as we refine our initial approximations, we can find C2, 
C3, and C4 as well. 


ei(/i) = 2e 


^)-m 


= 2e + ci h+^-h 2 + ^-h 3 + ^-h A + 0{h 5 ) 

2 4 8 

— (e + c\h + Cih 2 + c 3 h 3 + ah* + 0(h 5 )) 

C2 ,2 3C3 3 7C4 , 4 . 

= e-—h ~~^h - —h + 0(h ). 

Therefore, e\[h) — e ~ -^-h 2 , from which we conclude 


-2(ei(h) - e) 
K 2 


c 2 . 


(a) Use this formula and the values in 4.5.1 to verify 
that c 2 ~ 1.24. 

(b) Approximate C3 using values in 4.5.2. 

(c) Approximate C4 using values in 4.5.3. 

(d) Compare these approximations of ci,c 2 ,C 3 ,C 4 to 
the exact values in crumpet 30. 


3. Suppose N approximates M according to N[h) = 
M+K 1 h 3 +K 2 h 5 +K 3 h T + ■ ■ ■ . Of what order will N 3 {h) 

(the third generation Richardson’s extrapolation) be? 

[A] 


4. Suppose N approximates M according to N(h) = 
M + K\h 2 + I< 2 h 4 + K 3 h 6 + ■ • ■ . What would you 
expect the value of 

\M-N(h/3)\ 

\M- N(h/A)\ 

to be for small h, approximately? ^ 

5. N(h) = 1 ~^ sh can be used to approximate ^ 

.. 1 — cos h 


(a) Compute 1V(1.0) and A(0.5). 

(b) Compute Ah (1.0), the first Richardson’s extrapo- 
lation, assuming 

i. N(h) has an error of the form Kih + Koh 2 + 
K 3 h 3 + ■■■ 

ii. N(h) has an error of the form Koh 2 + + 

K 6 h 6 + ■■■ 

(c) Which of the assumptions in part 5b do you think 
gives the correct error and why? 

6. The backward difference formula can be expressed as 

f'i x o) = llf(xo) - f{x 0 - h)] 

+ ^/"(®o) - y/"'(® 0) + 0(h 3 ) 


(a) Use Richardson’s extrapolation to derive an 
0{h 2 ) formula for f'(x 0). 

(b) The formula you derived should look familiar. 
What formula does it look like? Is it exactly the 
same? Why or why not? 

7. Derive an 0(h 3 ) formula for approximating M that 
uses N(h), N(^), and A r (|), and is based on the as- 
sumption that ^ 

M = N(h) + K\h + K 2 h 2 + K 3 h 3 + ■■■ . 


8. The following data give estimates of the integral M = 

cos x dx. 

N{h) = 2.356194 N{h/ 2) = -0.4879837 

N{h/A) = -0.8815732 N{h/ 8) = -0.9709157 

Assuming M — N{h) = I\\h 2 + I\ 2 h 4 + K 3 h 6 + ■ ■ ■ , find 
a third Richardson’s extrapolation for M. ^ 

9. Suppose that N(h) is an approximation of M for every 
h > 0 and that 

M - N(h) = Iuh + K 2 h 2 + K 3 h 3 + • ■ ■ 


for some constants Ki, K 2 , K 3 , . . .. Use the values 
N{h), N{h/ 3), and N{h/ 9) to produce an 0{h 3 ) ap- 
proximation of M. ^ 

10. Use Romberg integration to compute the integral with 
tolerance 10~ 4 . 
r3 

ln(sin(:r))cfe ^ 


(a) 

(b) 


1 ; 

L 


\J x cos x dx 
4 e*hi(xfdx [a] 

X 


(d) f 13 V r 

J 10 

[ lnr _e 
J In 3 


+ cos 2 x dx 


-dx ^ 


(e) 

(f) f 

Jo 

(g) f x 2 ln(a; 2 + 1 )dx 

Jo 


x 2 - 1 

X 2 + 1 


[A] 


11. ° o Write a Romberg integration Octave function. ^ 

12. < • (i) Use your code from question 11 to approximate 
the integral using tol = 10 _B . (ii) Calculate the actual 
error of the approximation, (iii) Is the approximation 
accurate to within 10“ 5 as requested? 

r 2w 

[A] 


f 2 

(a) / a: sin (a; )dx 

Jo 

r 2 1 

(b) / — dx 

J 0.1 x 

(c) f x 2 ln(x 2 + 1) dx 

Jo 


NOTE: f 2 x 2 ln(a: 2 + 1) dx = 241n (5)-6tan 1 (2)-4 _ 

13. Compare the results of question 12 with those of ques- 
tion 30 on page 166. 
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Answers 

lim(l + /i) 1 A = e: Begin by noting In [(1 + /i) 1 ^] = get 


L 


ln(l + h) 

inn 

h—tO h 

h-> 0 


Zk(h) 


1 


lim 

h —> 0 1 -)- h 

1 . 


Thus L = 1, and due to continuity of the exponential function, e x , 


„ lim*, _ 

6 = e =6 


ln(l + h) _ ln(l + h) 

h = lim e h 
h — >-0 


lim e ln [( 1 +C 1/h ] 
/i-AO 

lim (1 + h) l / h . 
h^r o 


64 (h): We use formula 4.5.4 with a = m = 4, and n = 5 to find 


e 4 (/i) = 


16g 3 (|) - e 3 (h) 
15 

e + 0 (h 5 ). 


Applying this formula to e 3 (0.01) and e 3 (0.005) we get 

16(2.718281828448482) - 2.718281828281785 


e 4 (0.01) = 


15 


= 2.718281828459595, 


a value that is accurate to 13 significant digits! 

Jq 1 sin x dx—Tio 


error ratio: We should expect 


r l — — — to be about 1.5 2 = 2.25 because 15 (the number of intervals used in 

J 0 sin x ax— X15 

the approximation of the denominator) is 1.5 times 10 (the number of intervals used in the approximation of 
the numerator). 

T 5 3 and its error: |io s ™xdi t 5i2 | ^ 1 .522(10) ^ g^ 

0,,;> |Jq sinxdx— T io )2 | 2.37 4 (10 ) 1j 


T s , 3 = 


64T 10j2 — Tr 


5,2 


sin x dx — 25,3 


63 

4(10) -16 


.4596976941318606 
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More Interpolation 


5.1 Osculating Polynomials 

The Taylor polynomials of Section 1.2 and interpolating polynomials of Chapter 3 represent opposite extremes in 
the spectrum of osculating polynomials. Taylor polynomials require the value of the polynomial at a single point 
while interpolating polynomials require the value of the polynomial at, generally anyway, multiple points. Taylor 
polynomials require the values of, generally anyway, multiple derivatives while interpolating polynomials do not 
allow derivative specification. 

The set of osculating polynomials contains Taylor polynomials, interpolating polynomials, and hybrids. Any 
polynomial required to pass through any set of points with any number of derivatives specified at those points is 
called an osculating polynomial. Thus a Taylor polynomial is the special case of an osculating polynomial specified 
by one point and any number of derivatives at that point. An interpolating polynomial is the special case of 
an osculating polynomial specified by any number of points and no derivatives at any point. To be precise, an 
osculating polynomial is one that is required to pass through a set of points 


5 




with the first m,; derivatives specified at (ti, yi), i = 0, 1, . . . , n. As before, the to, ti, . . . , t n are called nodes. 

One useful type of osculating polynomial is the Hermite polynomial in which the value of the polynomial and 
its first derivative are both given at each node. Even more specifically, third degree, or cubic, Hermite polynomials 
play an important role in approximation theory. Since a third degree polynomial has four parameters, data — the 
ordinate and first derivative — at two nodes is sufficient to specify such a polynomial. So suppose we wish to find a 
polynomial p of degree at most three that passes through (to,yo) and (t±,yi) with derivative yo at to and yi at t±. 

Remembering the lessons of our study of interpolating polynomials, we might begin with the Lagrange form of 
the interpolating polynomial passing through (to,yo) and (ti , 3 / 1 ) and worry about the derivatives later. That gives 
us f(t) = yo + tx-to y 1 begin. Of course / passes through the required points, but it is not even potentially 
cubic, and its derivative is f'(t) = ■ f ^ , a constant. It would be nice if we could add to it, a third degree 

polynomial that has zeroes at to and t\ and whose derivatives we can control. Well, g(t ) = (t — to)(t — fi) 2 , for 
example, is cubic, has zeroes at t 0 and t 1; and has derivative ( t — ti) 2 + 2 (t — to)(t — ti), so we have at least some 
control over its derivative. Great, now let us look at it a little more closely: 

g' (t) = (t — H) 2 + 2 (t — to){t — ti) = (t — ti) [( t — ti) + 2 (t — t 0 )] • 

So g'(ti) = 0 and g'(to) = (to — H) 2 is nonzero. That should remind you of how we developed the Lagrange 
interpolating polynomial. Only, there, the value of the polynomial was either 0 or 1 at each node before we added 
an unknown coefficient. Of course, g(t) = ^^^2 has derivative 1 at ti and 0 at 1 0 . Putting it all together, 

g a (t) = has everything we need to control the derivative at to- Similarly, hb{t) = has 

everything we need to control the derivative at t±. The sum of g a and is a degree at most three polynomial with 
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zeroes at f 0 and t\ and easily specified derivatives at t 0 and t±. Finally, a polynomial p of the form 


p(t) = 


t — t 1 
to ~ t\ 
t — ti 


yo 


-yo 


t-t 0 
1 1 — to 
t-to 


y i 


-y i 


g a (t) + h b {t) 

(t - t 0 )(t - ti ) 2 


(t - t 0 ) 2 {t - ti) 


to-ti”” ' ti-io"" ' " (to-ii ) 2 (fi-to ) 2 

would be the Hermite polynomial we are after. The first two terms form the interpolating polynomial passing 
through the required points. The last two terms are zero at to and t\ so do not affect this interpolation. Moreover, 
the last two terms are chosen so that their derivatives are convenient at to and t-| . The derivative of 

is 1 at to and 0 at t\. The derivative of k (ti-t 0 ) 2 U ® an< ^ a ^ ti- These characteristics ensure simple 
values for a and b in terms of the specified derivatives. To find out exactly what they should be, it remains to force 
p(to) = Vo and p{ti) = yp 


p(x) = 


V i 


ti - t. 


yo | 2 (t-ti)(t-t 0 ) _ i | 2 (t-to)(t-ti) b 


(t-t i ) 2 


(t ~ to f 


{to - ti) 2 


SO 


p{to) = 


(t 1 - to) 2 

2 /i — yo . 

h a 

1 1 — to 

and 


(to~ti ) 2 {ti-to) 2 


■ \ 2/1 - yo , 

PPi) = r- + 6 - 

r l — CO 

Therefore, we need c = yo — and d = in — The desired degree at most three Hermite (osculating) 

polynomial is 


t t\ t to 

p{t) = t — 7-yo + 7 — 
to — ti ti — to 


(t — ti) 2 (t — t 0 ) f . , (t-t 0 ) 2 (t-ti) 

( 2 /o ^ m ) H 77 7-77 (2/1 - tn) 


(to — ti ) 2 


(ti - to)" 


(5.1.1) 


where to = ^ — 7 s -. 

*1 —Co 

This form of the Hermite cubic polynomial is convenient for humans. It is formulaic and requires very little 
computation to write down. We will call it the Human form of the Hermite cubic polynomial. A more computer- 
friendly form, which we will refer to as the Computer form of the Hermite cubic is obtained via divided differences. 
In general, for an osculating polynomial where the first k derivatives are specified at ti, ti and yi must be repeated 
k + 1 times in the divided differences table. Quotients that would otherwise be undefined as a result of the repetition 
are replaced by the specified derivatives, first derivatives for first divided differences, second derivatives for second 
divided differences, and so on. 

For the cubic Hermite polynomial p passing through (to 5 2 /o ) and {ti,yi) with derivative yo at to and jj\ at ti , 
the table looks like so: 


to 

to 

ti 

t\ 


yo 







2/i ¥1 

2/i 


The four remaining entries are to be filled in by the usual divided difference method. Can you compute them in 
general (in terms of to, ti, yo, yi, yo, in)? Answers on page 183. Using the results, we write down the interpolating 
polynomial in two ways: 


p{t) = 2/o + [yo] {t ~ t 0 ) + 

2 /i + 2 /o 


yo 


y 1 - yo 

(fi - to) 2 h - to 


{t-tof 


-2 


y 1 - yo 


(h - 1 0 ) 2 {ti - 1 0 ) 3 


(t — t 0 ) 2 (t ti) 


and 


p(t) = yi + [in] {t - h) + 
2/1 + yo 


yi 


y 1 - yo 

*i - 1 0 (ti - to ) 2 


0 t~ti ? 


-2 


yi - y 0 


(t — tl) 2 (t to). 


.(ti-to ) 2 (ti-to) 3 . 

Just as we had for interpolating polynomials, we have two ways to find cubic Hermite osculating polynomials. One 
way is convenient for humans and the other for computers. 
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Bezier Curves 

Forcing (x(0),y(0)) = (—1,2), we need 

x(0) = a x = -1 

2,(0) = a v = 2. 

Forcing (x(l),y(l)) = (5,-2), we need 

x(l) = a x + b x + c x = —1 + b x + c x = 5 

2,(1) = tty + by + Cy = 2 + fry + Cy = 2 


or 


b x + c x = 6 

by Cy = —4. 

Bezier curves are parametric curves with parameter t £ [0, 1] connecting two points. The simplest Bezier curve 
is a straight line passing through the two points. For example, the simplest Bezier curve from (—1,2) to (5,-2) is 
given by the parametric linear functions 


x(t) = (l-f)(— l)+t(5) 
y{t) = (1 — t)(2) + t(— 2), 

which we choose to write down in Lagrange form. You can check that x(0) = — 1, x(l) = 5, 2/(0) = 2, and 2/(1) = —2. 
In other words, x passes through (0, —1) and (1, 5) while y passes through (0, 2) and (1, —2). This parametrization 
is unique because x and y are interpolating polynomials. 

One the other hand, if we allow x and y to be quadratic, there are infinitely many (parametric) pairs of functions 
connecting (—1,2) to (5,-2) even if we require x and y to be interpolating polynomials and restrict the parameter 
t to the interval [0,1]. That is not to say we do not have quadratic Bezier curves, but rather that we need to specify 
more than just the two points to be connected. Allowing the parameter function to be quadratic, we have say 

x(t) = a x + b x t + c x t 2 

y(t) tty “t“ byt “t“ Cyt , 


giving six unknowns or undetermined coefficients, if you will. That leaves two conditions that may yet be imposed 
on the parameter functions. 

Any particular quadratic Bezier curve is prescribed by specifying a control point distinct from the two endpoints. 
The two linear Bezier curves, one connecting (—1, 2) to the control point and the other connecting the control point 
to (5,-2), then determine the quadratic Bezier curve. Suppose Bio(t) is the linear Bezier curve from (—1,2) to 
the control point and is the linear Bezier curve from the control point to (5, —2). These two curves define 

a family of linear Bezier curves, namely the set of linear Bezier curves from Bi 0 (to) to Bi .i(f 0 ), where to G [0, 1]. 
Letting B- 2 ,o,t 0 (t) be the linear Bezier curve from Si,o(io) to l?i i i(to), the point l? 2 ,o,io(to) is on the quadratic Bezier 
curve from (—1, 2) to (5, —2) via the given control point. The collection of all such points as to varies from 0 to 1 
is the quadratic Bezier curve we are after. Different control points determine different quadratics. For example, if 
we have (0,4) as our control point, Bi o is the linear Bezier curve connecting (—1,2) to (0,4) and B\ \ is the linear 
Bezier curve from (0,4) to (5,-2): 


(t) 


(!-*)(—!) \ 
(l-i)(2) + i(4) j 


and 




t( 5) 

(l-f)(4)+f(-2) 


i?2,o, t 0 is the linear Bezier curve connecting to Bi^{to). Therefore, i?2,o,*o(^) = (1 — t)Bi,o{to) + 

or 


(l-to)(-l) 


^ W = (1 - H (1 - W) V 2(4) + U (1 - Ann. to(— 2 ) 


*o(5) 
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Then 


Bo 


(l-to)(-l) 


2 , 0 , to (t 0 ) - (1 to) ( (1 _ io)(2) + t 0 ( 4) ) + h ( (1 - io)(4) + to(-2) 


to(5) 


Observe that i?2,o, t 0 is quadratic as a function of to and that i?2,o,o(0) = 


-1 

2 


and P 2 ,o,i(l) = 


5 

-2 


But the notation -B 2i o,t 0 (to) is cumbersome and we are really interested in a parametrization of the quadratic 
anyway. Letting B 2 o(t) = B 20 t{t), we get the quadratic Bezier curve from (—1,2) to (5,-2) via control point 
(0,4): 


B 2 ,o{t) — (1 — t) 


(l-t)(-l) 

(l-t)(2)+*(4) 


+ t 


t( 5) 

(l-t)(4) + t(-2) 


and we have cleaner notation. 

With some algebra, the expression for B 2 o can be simplified, but leaving it unsimplified emphasizes whence it 
came. It is the result of nested linear interpolations. Higher order Bezier curves are constructed by continued nesting. 
We now use this idea to define the Bezier curve from Po to P n via control points P\, P 2 , . . . , P n -\. Commonly, Po 
and P n are also considered control points and so this Bezier curve is also referred to as the Bezier curve with control 
points Po, Pi, . . . , P n . Such a Bezier curve will have degree at most n. 

We begin by defining the linear Bezier curves 


B\,i(t) = (1 - t)Pi + (t)P i+ 1, i = 0, 1, . . . ,n — 1. 

Note that Bi ti is the linear Bezier curve from Pi to P i+ 1. Then 

Bj.i(t) = (1 - t) ■ -Bj-i,i(i) + (f) • Bj_i, i+ i(i), j = 2,3, ... ,n; * = 0, 1 , . . . , n - j. 


(5.1.2) 


(5.1.3) 


Note that B 2 i(t) is the quadratic Bezier curve connecting Pi to P,; + 2 via control point P, ;+1 . With a little algebra, 
you can confirm that Bo^{t) is at-most-cubic and connects Pi to Pi+o- An inductive proof will show that Bj t i(t) is 
an at-most-degree-j polynomial parametrization connecting Pi to Pi+j. Can you provide it? Answer on page 5.1. 
It follows that B n o(t) is the degree at most n Bezier curve connecting P 0 to P n . 

Returning to our previous example, we add the control point (5, 1) so we have now four control points: 



By equation 5.1.2, 

B M t) = (i-^ + (t)A=(i-t)(-;) + i(;) = (-; + 2 ;) 

= (i - t)Pi + (t)p 2 = (i —t) + 1 = (4 - 3 1) 

Bi, 2 (t) = (1 - t)P 2 + ( t)P 3 = (1 - t) Q + t (_ 5 2 ) = ^ 5 . 

And by equation 5.1.3, 

= (i - t)SM0 + «)+,«) = 0-0 (-p + 2 f) + * ( 4 !‘ 3( ) = 

Al(*) = (1 - t)Bl.l(t) + ( ( ) +2<t) = (1 - t) b _ P + , b _ P = ( 4-Qt ) ’ 


and 


-83,0 (t) 


(1 — t)B 2i o(t) + (t)B 2i i(t) 


(1 ~t) 


/— 1 + 2t + 4f 2 \ 
^ 2 + At - 5f 2 ) 


~b t 


f lOt - 5t 2 \ 

V 4-6* J 


-1 + 3 * + 12f 2 - 9t 3 
2 + 6t — 15f 2 + 5 1 3 


(5.1.4) 
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Figure 5.1.1: Three points on a cubic Bezier curve constructed by recursive linear interpolation. 



is the cubic Bezier curve from 



to 



via control points 



and 



Figure 5.1.1 shows this 


Bezier curve and the construction of three of its points via recursive linear interpolation. The blue points lie along 
the linear Bezier curves Bi o, Pi,i, B 32 . The orange points lie along the quadratic Bezier curves B 2 o and P 2 ,i- 
The black points lie along the cubic Bezier curve. The graphs of the quadratics have been suppressed to avoid 
overcomplicating the figure. 

Figure 5.1.1 may help you grasp the recursion, but maybe more importantly, may help you understand the 
relationship between the control points and the Bezier curve. For example, upon close examination, you may be 
led to believe the line segments Bi o and Bi 2 are tangent to the cubic Bezier curve P 3> o at Pq and P 3 , respectively. 
Close examination of the formulas will confirm it. 

According to formulas 5.1.2 and 5.1.3, the (at most) cubic Bezier curve with control points Pq , Pi , P 2 , P 3 is 
computed thus: 


so 


Bi, 0 (t) = (l-t)P 0 + (t)Pi 

B h i(t) = (1 -t)Pi + (t)P 2 
Si,a(t) = (1 -t)P 2 + (t)P 3 


7?2,o(f) 


B 2 ,i(t) 


(1 

(1 

(1 

(1 


t)Bi t0 (t) + (t)Bi,i(t) = (1 -t) 
t) 2 Po + 2t(l — t)Pi + t 2 P 2 
t)B h i(t) + (t)B h2 (t) = (1 -t) 

tfPi + 2i(l - t)P 2 + t 2 P 3 


(1 - t)Po + (t)Pi 
(1 - t)Pi + (t)P 2 


t [(1 - t)Pi + ( t)P 2 

t\(l-t)P 2 + (t)P 3 


so 


B 3 ,o(t) — (1 — t)B 2 fi{t) + (t)B2.i(t) 

= (1 - t) [(1 - t) 2 Po + 2t(l - t)Pi + t 2 P 2 1 + t [(1 - t) 2 Pi + 2t(l - t)P 2 + t 2 P 3 (5.1.5) 
= (1 - t) 3 P 0 + 3f(l - t) 2 Pi + 3t 2 (l - t)P 2 + t 3 P 3 . 


Hence, ^B 3 ${t) = —3(1 — t) 2 Po + 3 [(1 — t ) 2 — 2t(l — t)] Pi + 3 [2t(l — t)— t 2 ] P 2 + 3t 2 P 3 , from which it follows 


d_ 

dt 


B?,,oit) 


t — 0 


d 

dt 


B 3 ,o(t) 


-3P 0 + 3 Pi = 3 (Pi - P 0 ) 
— 3P 2 + 3P 3 = 3(P 3 — P 2 ). 


Indeed, the derivative of B 3 3 at 0 is in the direction of the line segment from Pi to P 2 , and the derivative of B 3 q 
at 1 is in the direction of the line segment from P 2 to P 3 . Moreover, these derivatives have magnitude exactly three 
times the magnitudes of the line segments. 

Though we took a somewhat circuitous route, we now see another way to compute cubic Bezier curves besides 
using recursion 5. 1.2/5. 1.3 or formula 5.1.5. Control points Pq and P 3 give us two points x and y must pass through. 
Control points Pi and P 2 give us x and y at those two points. Thus specified, x and y are cubic Hermite polynomials! 
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To be precise, let P* = ( Xi,yi ) for i = 0,1, 2, 3. Then x(t) is the cubic Hermite polynomial with x(0) = Xq, 
x(0) = 3(xi — Xq), x(1) = X 3, and x(l) = 3(x3 — X2); and y(t) is the cubic Hermite polynomial with 2/(0) = 2/0, 
y( 0) = 3(yi - j/o ), 2/(1) = 2/3, and 2/(1) = 3(2/3 - Hi)- 


We close this section by computing the Bezier curve from 


-1 

2 


to 


5 

-2 


via control points 


and 


using equation 5.1.1 and comparing our results to 5.1.4. With x(0) = —1, x(0) = 3, x(l) = 5, and x(l) = 0 
the understood substitution of x for y), equation 5.1.1 gives m = = 6 and 


5 

1 

(and 


x{t) 

Using equation 5.1.1 with y( 0) 

y(t) = 


= 2 , 2/(0) = 6, 2/(1) = -2, and 2/(1) = -9 gives m = 

t — 1 ... t, (t — l) 2 t t 2 (t — 1) . „ ,, 

— (2) + T (-2) + - (6 + 4) + 1 (-9 + 4) 


-4 and 


While these equations are complete and correct, it is difficult to compare them to 5.1.4 without some simplification. 
Can you show 

x(t) = — 1 T 3 1 T 12/ 2 — 9/ 3 

y(t) = 2 + 6 t- 15 1 2 + 5 1 3 


as required? Answer on page 183. 


Crumpet 31: Bezier curves and CAGD 


Bezier curves were originally developed around 1960 by employees at french automobile manufacturing companies. 
Paul de Casteljau of Citroen was first, but Pierre Bezier of Renault popularized the method so has his name 
associated with the polynomials. 

Nowadays, almost all computer aided graphic design, or CAGD, software uses Bezier curves, particularly 
cubic, for drawing smooth objects. CAGD software with cubic Bezier tools will display the four control points 
and allow the user to move them about. In fact, the software will draw the two linear Bezier curves at the 
endpoints as well. This gives the user “handles” to manipulate the curve. Some software will include the third 
linear Bezier curve as well. The three linear Bezier curves together form the so-called control polygon. Since the 
relationship between the control points and the curve is intuitive, manipulation of the control points, whether it 
be by handles or control polygons, provides a means for swift modeling of smooth shapes. 

Some shapes are too intricate to model with a single cubic Bezier curve, however. To handle such shapes, 
CAGD software allows a user to string cubic Bezier curves together end to end, forming a composite, or piecewise, 
Bezier curve, such as that shown here. 



This particular curve is made of two cubic Bezier curves, one with control points Po, Pi, Pi , P3 and the other with 
control points P 3 , P 4 , P$, Pq. Since Bezier curves are intended to model smooth objects, software will provide the 
option of forcing derivative matching at a common point such as P 3 . This is done by making sure the common 
point is on the line segment between its two adjacent control points (P 2 and P4 in this diagram). You may view 
an interactive version of this diagram at the companion website. 

Free open source software such as Inkscape, LibreOffice Drawing, and Dia provide Bezier curve drawing 
tools, but not all of them use the technical term. Inkscape has a Bezier curve tool by that name, but LibreOffice 
Drawing’s Bezier curve tool is simply called “curve”, and Dia’s tool for single Bezier curves only, not composite, 
goes by the name of “Bezierline”. 
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References [ , 10, 9, 15, 27, 32] 


Key Concepts 

osculating polynomial: A polynomial whose graph is required to pass through a set of prescribed points 


{x 0 ,y 0 ),{x 1 ,y 1 ), . . . ,{x n ,y n ) 


and whose first rrii derivatives may also be specified at Xi. 

Hermite polynomial: An osculating polynomial required to pass through two points with its first derivative 
specified at each point. 

Bezier curve: A curve connecting two points via parametric osculating polynomials. 


Exercises 


1. Find the cubic Hermite polynomial interpolating the 
data. 


X 

fix) 

fix) 

1 

2 

1 

5 

3 

-1 


2. Find the Hermite polynomial of degree (at most) 5 in- 
terpolating the data. 


X 

fix) 

fix) 

0 

2 

1 

0.5 

2 

0 

1 

2 

1 


3. Let g(x) = (f2) x . 


(a) Using xo = 1 and xi = 2, find a Hermite interpo- 
lating polynomial for g. 

(b) Use the Hermite polynomial to approximate 
g(1.5). 

(c) Calculate the actual error of this approximation, 
and compare it to the error you got in question 
15 of section 3.2 on page 116. 

(d) Which polynomial approximated 3(1.5) with 
smaller absolute error, the Hermite or the La- 
grange interpolating polynomial? 


4. Find a polynomial that passes through the points (0, 0) 
and (4, —3) and whose derivative passes through the 
points (0, 1) and (4, 1). 

5. Construct the Hermite interpolating polynomial for the 
given data. Do this using a pencil, paper and calcula- 
tor, or a spreadsheet. Do not use Octave code. 


X 

fix) 

fix) 

0.1 

-0.29004996 

-2.8019975 

0.2 

-0.56079734 

-2.6159201 

0.3 

-0.81401972 

-2.4533949 


6. Find parametric equations for the cubic Bezier curve. 
The ends of the “handles” are the four control points. 



7. Write down the parametric equations of the Bezier 
curve with control points (—1,2), (—3,2), (3,1), and 
(3,0). It is not necessary to simplify your answer. 

8. Construct the parametric equations for the Bezier curve 
with control points (1, 1), (2, 1.5), (7, 1.5), (6, 2). 

9. Find equations for the cubic polynomials that make up 
the composite Bezier curve. 



10. The data in question 5 were generated using f(x) = 
x 2 cos(x) — 3x. 



182 


CHAPTER 5. MORE INTERPOLATION 


(a) Approximate /(0.18) using the polynomial from 
question 5. 

(b) Calculate the absolute error of this approxima- 
tion. 

11. Suppose H(x) = x 5 — 3x 4 + 2a; 3 — 6a; + 2 is a Hermite 
polynomial interpolating the data 


X 

fix) /'(*) 

0 

2 —6 

l 

-4 

2 

-10 2 


collected from a function /. Find the missing datum. 

12. A Hermite polynomial H(x) is constructed using the 
data 


X 

0.3 

0.5 

0.6 

0.8 

fix) 

0.8 

0.6 

0.3 

0.5 

f(x) 

1.5 

-1.2 

-5.3 

-2 


(a) Find ( H o H)'( 0.6). That is, the derivative of 
H(H(x)) evaluated at x = 0.6. 

(b) Find (/o/)'(0.8). 

13. The Hermite interpolating polynomial for the following 
data has the form H(x) = do + ai(x — 0.3) + a, 2 (x — 
0.3) 2 + . . .. 


X 

fix) 

fix) 

0.30 

0.295 

-0.155 

0.32 

0.314 

-0.149 

0.35 

0.342 

-0.139 


(a) Fill in the missing part of the form of H(x). 

(b) What is the maximum possible degree of H(x)7 

(c) Find do and a\. 

14. Construct the divided differences table that led to the 
Hermite polynomial 

p{x) = 2 - (a; - 1) + *(a; - l) 2 + * (a: - l) 2 (a; - 3). 

15. The Bezier Curve 

x(t) = Ilf 3 - 18f 2 + 3t + 5 
y(t) = t 3 + 1 

has control points (5,1), (6,1), and (1,2). Find the 
fourth control point. 

16. What is the minimum number of cubic Bezier curves 
in the diagram, and why? 




(a) The graph can not be the graph of a single cubic 
Bezier curve. Why not? 

(b) The graph is that of a composite cubic Bezier 
curve. At least how many cubic Bezier curves 
have been spliced together, and why? 

18. Give three reasons that might make you use a Bezier 
curve rather than a Lagrange polynomial to model a 
certain graph. 

19. The osculating polynomial p(x) passing through 
(xo,f(x 0 )) with P'(x o) = f'(x 0 ), P"(x 0 ) = f"(x o), 
and P"'(x o) = f"'(x o) is also called what? Be as spe- 
cific as you can. 

20. A cubic polar Bezier curve is the unique (parametrized) 
cubic polar function ( r(t),9(t )) satisfying the following 
data. 


t 

r{t) 

m 

r(t) 

m 

0 

ro 

e 0 

So 

Po 

1 

ri 

0i 

Si 

Pi 


(a) A standard cubic Bezier curve is given by the con- 
trol points (0,0), (2,0), (0,1), and (0,3) (in that 
order). Convert this data into polar coordinate 
data. Recall that the conversion from Cartesian 
coordinates to polar coordinates involves the for- 
mulas 

r = \/ x 2 + y 2 and tan 8 = — . 

x 

(b) Find the cubic polar Bezier curve based on your 
results from (a). 

21. " » Write an Octave function to compute Hermite poly- 
nomials. 

22. • A car traveling along a straight road is clocked at 
a number of points. The data from the observations 
are given in the following table, where the time is in 
seconds, the distance is in feet, and the speed is in feet 
per second. 


Time 

0 

3 

5 

8 

13 

Distance 

0 

225 

383 

623 

993 

Speed 

75 

77 

80 

74 

72 


(a) Compute a Hermite interpolating polynomial for 
the data. 
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(b) Use your polynomial from part (a) to predict the 
position (distance) of the car and its speed when 
t = 10 seconds. 

(c) Determine whether the car ever exceeds the 55 
mph speed limit on the road. If so, what is the 
first time the car exceeds this speed? 

(d) What is the predicted maximum speed for the 
car? 


NOTES: Speed is the derivative of distance. 

. miles „ _ miles 5280 feet 1 hour 

= 55- x x 


55 


hour 


hour 


mile 


3600 seconds 


80.67- 


feet 


second 

23. O Complete the following code. 


####################################### 


# Written by Dr. Len Brin # 

# 13 March 2012 # 

# Purpose: Evaluate an interpolating # 

# polynomial at the value z . # 

# INPUT: number z # 

# Data xO , xl , .... xn used to # 

# calculate the polynomial: x # 

# Entries a0;0, al;0,l, ... # 

# an;0,l n as an array: c # 

# OUTPUT: P(z), the value of the # 

# interpolating polynomial at z . # 


####################################### 
function ans = divDiffEval(z,x,c) 
n = length (x) ; 
ans = c (n) ; 
for i=l:n-l 

ans=(z-x(???) ) *ans+c(???) ; 
end#f or 
end#function 


Answers 

Hermite polynomial computer form: The four remaining entries are 

yi - yo 

1 1 — to 

h,i - yo _ yi - yo _ yo 
h - to (ti - to) 2 ti - 1 0 

m - /i,i _ _y\ yi - yo 

h - to t\- 1 0 (ti - 1 0 ) 2 
/i,2 - /o, 2 _ y\ + yo _ 2 yi-yo 
ti - to (ti - to) 2 (ti - to) 3 

Bezier curve Bj^(t) is an at-most-degree-j polynomial connecting P, to Pi+ji Proof. We proceed by in- 
duction on j, beginning with j = 1: Since 

Bi,i(t) = (1 - t)Pi + (t)P i+ i, i = 0,1, ... ,n - 1, 

Bi,i(0) = Pi and Bi,i(l) = Pi+\ so Bi yi connects Pi to Pj+i- Furthermore, t?i,i(t) = Pi + t(P:+i — Pi), so B\^ 
is an at-most-degree-1 polynomial. Now assume B 3 j{t) is an at-most-degree-j polynomial connecting P, to 
Pi+j for some j > 1 (and all applicable i ). By definition, Bj. |_ lj i(0) = Bj ti ( 0) and Bj +1 ^{ 1) = Bj ,m(l)- By 
the inductive hypothesis, Bj^(0) = Pi and P 7 ,i+i (1) = P-i+j+i , so B 1+ - i:i connects Pi to P-i+j+i ■ Furthermore, 


/i,i — 
/o,2 = 

/l,2 = 

/o,3 = 


Bj+i,i{t) — (1 — t) ■ Bj'i(t) + (t) • Bj^ + i[t) 


has degree at most j + 1 because Bj jft) and f?y,;+i(t) have at most degree j (by the inductive hypothesis). 
This completes the proof. □ 

Bezier curve via Hermite cubics: The simplification may be done as follows. 


.. t— 1 . t (t — l) 2 t t 2 (t, — 1 ) 

*(*) = — (-1) + t (5)+ V (3-6)+ (—6) 

= (t — 1) + 5t — 3t(t — l) 2 — 6t 2 (t — 1) 

= 6t — 1 — 3 t{i 2 - 2t + 1) - 6£ 3 + 6 1 2 
= 6t - 1 - 3f 3 + 6£ 2 -3 1- 6 1 3 + 6 1 2 
= -9 1 3 + 12 1 2 + 3t - 1 
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.. t — 1 t , (t — l) 2 t t 2 (t — 

y(t ) = — (2) + T (-2)+ l ^ (6 + 4) -I ^j— 

= -2(t-l)-2t + 10t(t-l) 2 -5t 2 (t-l) 

= -2t + 2 - 2* + 10i(t 2 - 2t + 1) - 5t 3 + 5t 2 
= 2 - 4t + 10t 3 - 20t 2 + lOt - 5 1 3 + 5t 2 
= 5t 3 - 15t 2 + 6t + 2. 


(-9 + 4) 
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5.2 Splines 

Osculating polynomials have limited use in applications where a curve is required to pass through a large number 
of points. And large may mean only a half dozen or so. Take the following innocuous-looking set of points. 

1 

0.5 
0 

- 0.5 

-1 


It is easy to imagine an equally innocuous function passing through these eight points, but actually finding such a 
function poses a slight challenge. The interpolating polynomial of least degree oscillates too widely. 




This is a common problem with high-degree interpolating polynomials. There is no control over their oscillations, 
and the oscillations are most often undesirable. The oscillations can be tamed to some degree by finding the 
osculating polynomial through these points with, say, a first derivative of 0 at 0 and of — f at the seventh point 
from the left (the one whose .T-coordinate is between 5 and 6). 



That’s better, but still leaves something to be desired. And the business of setting the first derivatives at two of the 
points strictly for the purpose of reducing the oscillations is a bit arbitrary— better to let the nature of the problem 
dictate. The oscillations of the previous attempts make them far too distinctive and interesting for the vapid set of 
points with which we began. A rightfully trite way to interpolate the data is by connecting consecutive points by 
line segments. 
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This forms what is known as the piecewise linear interpolation of the data set. This type of graph is often seen in 
public media. Many applications, especially those from engineering, require some smoothness, however. Connecting 
sets of three consecutive points by quadratic functions helps. 

1 

0.5 
0 

-0.5 

-1 

That takes care of smoothness at three of the points, but still lacks differentiability at the points common to 
consecutive quadratics. Moreover, using the first three points for the first quadratic (which looks linear to the 
naked eye), the third through fifth points for the second quadratic, and the fifth through seventh points for the 
third quadratic (which also looks linear to the naked eye) leaves only the seventh and eighth points for what would 
presumably be a fourth quadratic. With only two points, however, a line segment is used instead. A smoother 
solution to the problem is to make sure the first derivatives of consecutive quadratics match at their common point. 
With that in mind, it makes sense to fit only two points per parabola, leaving one coefficient (of the three in any 
quadratic) for matching the derivative of the neighboring quadratic. 



That’s better! This piecewise parabolic function has continuous first derivative, but there is still something arbitrary 
about it. The seven parabolas have, all together, 21 coefficients. Making each parabola pass through two points 
gives 14 conditions on those coefficients. Having adjacent parabolas match first derivatives at their common points 
gives 6 more conditions, one at each of the 6 interior points. That leaves one “free” coefficient. Specifying one last 
condition seems a bit arbitrary, and is. The graph shows the result when the derivative at 0 is set to 1. Notice 
there is no control over the derivative at the right end. Besides the arbitrariness, this asymmetry is bothersome. If 
only we had one more degree of freedom... 



Piecewise polynomials 


A piecewise-defined function whose pieces are all polynomials is called a piecewise polynomial. It takes the form 


p{x) = < 


Pi(x), 

P2(x), 


x e [3:0,21] 
x e (xi, 2 2 ] 


.Pn{ 2) 2G(2 n _i,2„] 


where Pi{x) is a polynomial for each i = 1,2, ... ,n and Xq < Xi < ■ ■ ■ < x n ; or some variant where p(xj) is defined 
by exactly one of the Pi. If each pi is a linear function, p is called piecewise linear. If each pi is a quadratic function, 
p is called piecewise quadratic. If each pi is a cubic function, p is called piecewise cubic. And so on. Examples of 
piecewise linear and piecewise quadratic functions appear in the introduction to this section. 


Splines 

Nothing about the definition of piecewise polynomials requires one to be differentiable or even continuous. The 
following function is a piecewise polynomial. 
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Most applications of piecewise polynomials require continuity or differentiability, however. Any piecewise polynomial 
with at least one continuous derivative is called a spline. The points separating adjacent pieces, the Xj, j = 
1, 2, . . . , n — 1, are called knots or joints. 

The last graph in the introduction to this section shows a quadratic spline. Each piece of the piecewise function 
is a quadratic, and the quadratics are chosen so that their derivatives match at the joints. As pointed out there, 
though, we needed to supply one unnatural condition — the derivative at the left endpoint. It could have been the 
derivative at any of the points, or even the second derivative at one of the points. In a very real sense, the choice 
was arbitrary. It was not governed naturally by the question at hand. Consequently, there is a family of solutions 
to the problem of connecting those eight points with a continuously differentiable piecewise quadratic. 

Cubic splines 

The most common spline in use is the cubic spline. As with the quadratic spline, a cubic spline is computed by 
matching derivatives at the joints. In fact, there are enough coefficients in the set of cubics that both first and 
second derivatives are matched. Note that, according to our definition of spline, matching both first and second 
derivatives at the joints is not strictly necessary, however. Other sources will give a more restrictive definition of 
spline where matching both derivatives is required. As a matter of convention, we focus on such splines. 

A cubic spline required to interpolate n + 1 points has n — 1 joints and n pieces. It follows that the set of cubics 
has 4n coefficients. Requiring each cubic to pass through 2 points gives 2 n conditions on the coefficients. Requiring 
first derivative matching at the joints gives n — 1 more conditions. Requiring second derivative matching at the 
joints gives an additional n — 1 conditions for a grand total of 4n — 2 conditions. That leaves 2 “free” coefficients. 
Mathematically speaking, we have a family of splines with two degrees of freedom. To find any specific spline, we 
need to enforce two more conditions on the coefficients. These conditions may include the first, second, or third 
derivative at two of the nodes, both the first and second derivative at a single node, or some other combination of 
two derivative requirements. 

Guided perhaps by knowledge of draftsman’s splines, convention leads us to supply endpoint conditions. That 
is, we require something of some derivative at Xq and at x n . Supplying the first derivative is akin to pointing 
the draftsmen’s spline in a particular direction at its ends. Setting the second derivative equal to 0 is akin to 
allowing the ends of a draftsman’s spline to freely point in whatever direction physics takes them. These models of 
draftsman’s splines are not particularly accurate, but they are motivational. 

A cubic spline with its first derivative specified at both endpoints is called a clamped spline. A cubic spline with 
its second derivative set equal to zero at both endpoints is called a natural or free spline. A hybrid where the first 
derivative is specified at one end and the second derivative is set to zero at the other has no special name. To be 
precise, we have the following definitions. 

Let (xo, yo), (xi,yi), ■ ■ ■ , ( x n , y n ) be n + 1 points where Xq < X\ < ■ ■ ■ < x n and let Si{x) = a,i + bi{x — Xi) + 
Ci(x — Xi) 2 + di(x — Xi) 3 for i = 1, 2, ... ,n. Then S , defined by 


Si(x), 

x e [x 0 ,xi] 

S 2 {x), 

x e [xi,x 2 ] 

S n (x) , 

X £ [X„_1,X; 


is a cubic spline if it satisfies the following three conditions. 

1. Si(xi_ i) = Di-i and Si(xi) = yi for i = 1, 2, . . . , n (interpolation) 

2. S'^Xi) = S' i+1 (xi) and S”(xi) = S” +1 (xi) for i = 1, 2, . . . , n — 1 (derivative matching) 

3. One of the following is satisfied (endpoint conditions) 

(a) ^"(xo) = S”(x n ) = 0 

(b) .S'j(xo) = TO o and S' n {x n ) = m n for some mo and m n 

(c) iS'j(xo) = mo for some mo and S^(x n ) = 0 

(d) S" (xo) = 0 and S' n (x n ) = m n for some m n 

If endpoint condition 3a is satisfied, S is called a free spline or natural spline. If endpoint condition 3b is satisfied, 
S is called a clamped spline. 

The natural (cubic) spline passing through the eight points presented in the introduction to this section looks 
like this. 
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Finally, a function that is as unspectacular as the data set itself! How was it calculated, you ask? The short answer 
is, the 28 simultaneous equations resulting from the definition of natural cubic spline were solved. The solution 
provided the coefficients a,;, bi, c.i , di, i = 1, 2, . . . , 7. 


Setting up the equations 

The long answer is, well, a bit longer to tell, but really only differs from the short version in the level of detail. To 
begin, the requirement that Si(xi) = yi immediately gives us the values of n of the coefficients: 


Si(xi) = cii = yt. 

The requirement that Si(xi-i) = yi - 1 gives us the n equations 

Si(xi- 1) =yi + bi(xi - 1 - Xi) A Ci(xi - 1 - Xi) 2 A di(xi _i - Xi) 3 = yi - i (5.2.1) 

for i = 1,2, ... ,n. The derivative requirements give us n — 1 equations each: 

‘5 ) (Xij — (£i) A* bi — bi - )_i T 2Ci-s r \(Xi Xi- f-i) A 3d>i-\-i(Xi (5.2.2) 

S”{xi) = S'i +1 (xi) A 2ci = 2c i+ i + &d i+ i(xi - x i+ i) (5.2.3) 

for i = 1, 2, . . . , n — 1. Finally, the endpoint conditions give us the two equations 

S"(xo) = 2ci + 6di(a;o — aq) = 0 (5.2.4) 

S”(x n) = 2c n = 0. (5.2.5) 


Without much ado, we have the values of the and of c n . The remaining 3n — 1 coefficients are found by solving 
the remaining 3n — 1 simultaneous equations. Though a computer can certainly handle the solution from here, 
finding a bit of the general solution by hand gives a much more efficient algorithm. 


Solving the equations 

Essentially, we now have three equations with three unknowns. Equations 5.2.1, 5.2.2, and 5.2.3 are written in 
the variables bi,Ci,di. Equation 5.2.3 can easily be solved for di in terms of Ci and equation 5.2.1 can easily be 
solved for bi. The resulting expressions can be substituted into equation 5.2.2 to get an equation in only C;. It is a 
straightforward matter to complete the calculation. At this point, it becomes convenient to define hi = Xi-\ — Xi. 


(5.2.3) 


(5.2.1) 


a i + 1 
di = 
bi = 
bi 

■ bi 

■ bi-\. i = 


_ c i+ 1 

, * = 1,2,..., 

* = 2,3, . . . ,n. 

Cihi dihi , 

c h ^ Ci ~ 1 ~ 

3/^+1 

Ci— 1 Ci 

3 hi ’ 
yi- 1 - yi 

Si 

... 1 
^ T— H 

•i 

hi 

'-'l' u l 0 

O 

Vi - i - yi 

(c^_ i + 2ci)hi 


i = 2,3, . . . ,n 


hi 3 

Vi Vi+1 ( Ci A 2c,;_|_i)/i2-}_i 
hi + 1 3 

Substituting into equation 5.2.2, 

yi — l yi (Q— i A 2ci)hi yi 2 /i-j-i (ci A 2c2_|_i)/q+i 

hi 3 hi-i-i 3 


i = 2, 3, . . . , n 


(5.2.6) 


(5.2.7) 


, i = 1, 2, . . . , n — 1. 


A 2.Ci-\-\hi-\-\ A (ci Cj_|_i)/q+i 


3 


3 
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for i = 2, 3, . . . , n — 1. With a bit of simplification, this becomes 


hiCi-i +2(h t + h i+1 )ci + h l+ ic i+1 = 3 ( — — - — Vl+l ) , i = 2, 3, . . . , n - 1. (5.2.8) 

V n i n i + 1 / 

We now have n — 2 equations in the n unknown c^. These equations hold for any cubic spline with any endpoint 
conditions. But equation 5.2.2 has not been used with index i = 1. Hence, we still have to incorporate 

bi = 62 + 2c 2 h 2 3 ^ 2/12 (5.2.9) 

into the solution. It remains to replace 61 , 62 , and d 2 by expressions in c^. 

To begin, equations 5.2.7 and 5.2.6 with i = 2 give 

Z/i - 2/2 _ (ci + 2 c 2 )fe 2 
/12 3 

ci - c 2 
3/i2 

Making the substitutions for 62 and d 2 , equation 5.2.9 becomes 

, 2 /i — 2/2 (ci + 2c 2 )h 2 . , 

01 = — h 2c 2 h 2 + (ci — C 2 )n 2 

h 2 3 

= Vl t V2 + ^ 2 Ci + ~ h 2 c 2 . (5.2.10) 

tl2 O O 


6 2 = 
d 2 = 


We have not used the endpoint conditions yet, so this equation is good for any cubic spline. Whatever endpoint 
conditions are given must result in an expression for b\ in terms of c* plus one other equation in the Ci. 

In the case of the free spline, endpoint condition 5.2.5 gives c n = 0. This is the first of the final two equations. 
Endpoint condition 5.2.4 gives d\ = — This relationship is not directly useful since we are looking for an 
expression for b\. However, equation 5.2.1 with i = 1 gives b\ = Vo ^ 1 — C\h\ — d\h\ so we can use it to find 


h = 



2 

3 


Ci/li. 


Finally, substituting into equation 5.2.10, the final equation in Ci is Vo h ^ 1 — |ci/ii = Vl h ^ 2 + \h 2 c\ + \h 2 c 2l which 
simplifies to 

2(h 1 + h 2 )c 1 + h 2 c 2 = 3(^- y ^y (5.2.11) 

Equations 5.2.8, 5.2.11, and c n — 0 are n equations which can be solved for the n coefficients Q. Back-substitution 
will give the values of the bi and di. 

Other endpoint conditions lead to a different pair of final equations, but the process is the same. We need to 
substitute an expression for 61 into 5.2.10 and come up with one other equation. 


Natural spline Octave code 

Computing a spline for three or four points can be done by hand with a bit of patience and attention to detail, 
but many more points and the algebra becomes too tedious. However, each of the equations in Ci have no more 
than three of the Ci at a time, and they appear in a regular pattern, at least for n — 2 of the equations. These 
characteristics make automating the solution reasonably straightforward. The following code is perhaps not the 
most efficient for finding a natural spline, but it is presented this way for two reasons. First, it is meant to emulate 
the algebraic solution outlined in the previous section closely, making it clearer to follow. Second it is meant to be 
general enough that modifying it for other endpoint conditions would take minimal effort. Such modification will 
be requested in the exercises. 


mmmmmmmmmmmnmmnmm 


/ Written by Dr. Len Brin 3 June 2014 °L 


7, Purpose: Calculation of a natural cubic i 
1 spline. ”/o 
"/, INPUT: points (x(l),y(l)), (x(2),y(2)), ... "/, 
”/o spline must interpolate. "/, 
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*/. OUTPUT: coefficients of each piece of the 


l 

1 

7 . 

1 

1 

1 


l 

1 

1 

1 

1 


piecewise cubic spline: 


S(i,x) = a(i) 


+ b(i)*(x-x(i+l)) 

+ c (i) * (x-x(i+l) ) ~2 
+ d(i)*(x-x(i+l))~3 



function [a,b,c,d] = naturalCubicSpline(x,y) 
n=length(x) -1 ; 
for i=l : n 

h(i)=x(i)-x(i+l) ; 
end7«f or 

7, Left endpoint condition: 

°L m(l,l)*c(l) + m(l,2)*c(2) = m(l,n+l) 

m(l,l)=2*(h(l)+h(2)) ; m(l,2)=h(2); 

m(l ,n+l)=3* ( (y ( 1 ) — y (2) ) /h (1)— (y(2) — y (3) ) /h (2) ) ; 

7, Right endpoint condition: 

7. m(n,n-l)*c(n-l) + m(n,n)*c(n) = m(n,n+l) 
m(n,n-l)=0; m(n,n)=l; m(n,n+l)=0; 

7. Conditions for all splines: 
for i=2 : n-1 

m(i , i— 1) =h ( i) ; 
m(i,i)=2*(h(i)+h(i+l)) ; 
m(i , i+l)=h(i+l) ; 

m(i ,n+l)=3* ( (y (i)-y (i+1) )/h(i)-(y (i+l)-y (i+2) )/h(i+l) ) ; 
end7«f or 

7. Solve for c(i) 

1(1) =m (1,1) ; u(l)=m(l,2)/l(l) ; z(l)=m(l ,n+l)/l(l) ; 
for i=2 : n-1 

l(i)=m(i,i)-m(i,i-l)*u(i-l) ; 
u(i)=m(i,i+l)/l(i) ; 

z(i)=(m(i,n+l)-m(i , i-l)*z(i-l) )/l(i) ; 
end7«f or 

l(n)=m(n,n)-m(n,n-l)*u(n-l) ; 
c(n)=(m(n,n+l)-m(n,n-l)*z(n-l) )/l(n) ; 
for i=n-l : -1 : 1 

c(i)=z(i)-u(i)*c(i+l) ; 
end7«f or 

7. Compute a(i), b(i), d(i) 

7. Endpoint conditions: 

b(l)=(y(l)-y(2))/h(l)-2*c(l)*h(l)/3; 

d(l)=-c(l)/(3*h(l)); 

7. Conditions for all splines: 
a(l)=y (2) ; 
for i=2 : n 

d(i) = (c(i-l)-c(i))/(3*h(i)) ; 

b(i)=(y(i)-y(i+l))/h(i)-(c(i-l)+2*c(i))*h(i)/3; 
a(i)=y (i+1) ; 
end7«f or 
end7«f unction 


naturalCubicSpline .m may be downloaded at the companion website. 
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An application of natural cubic splines? 

“For many important applications, this mathematical [cubic spline] model of the draftsman’s spline is highly real- 
istic.” 1 Claims such as this rely on the assumptions that a draftsman’s spline is aptly modeled by a thin beam and 
that beam deflections are small. But the shapes modeled by splines often include large deflections, and unless the 
draftsman’s spline is damaged in some way, its shape will be an infinitely differentiable curve. Cubic splines gener- 
ally lack continuity in their third derivative, hence, do not have higher order derivatives. Moreover, the endpoint 
conditions Sq(xq) = S”(x„) = 0 do not translate well to the physical situation. These conditions imply the shape 
of the spline has zero curvature (concavity) at the endpoints while nothing about the physical situation points to 
that conclusion. 

Despite the cubic spline’s ineffective use as a model for a draftsman’s spline, it can be used with great efficacy 
in design applications. At Boeing, the airplane manufacturer, for example, they are used in computer-aided graphic 
design, computer-aided manufacturing, engineering analysis and simulation, and as a key component in Boeing’s 
Automated Flight Manual system. By 2005, it was estimated that Boeing’s use of splines involved about 500 million 
spline evaluations every day! 2 


Exercises 

1. What problem with polynomial interpolation does cu- 
bic spline interpolation address? 

2. Write down the system of equations that would need 
to be solved in order to find the cubic spline through 
(0, —9), (1, —13), and (2, —29) with free boundary con- 
ditions. Do not attempt to solve the system. ^ 

3. Set up but do not solve the equations which could be 
solved to find the free cubic spline through the points 
(1,1), (2, 3), and (4,2). 

4. List three reasons that might make you use a cubic 
spline rather than a Lagrange polynomial to model a 
certain graph. 

5. Write down a system of equations that could be solved 
in order to find the free cubic spline through the fol- 
lowing data points. Do not solve the system. 


X 

f(x) 

0.1 

-0.62 

0.2 

-0.28 

0.3 

0.0066 

0.4 

0.24 


6. Write down the system of equations that would need 
to be solved in order to find the cubic spline through 
(0, —9), (1, —13), and (2, —29) with clamped boundary 
conditions S' (0) = 1 and S'( 2) = —1. Do not attempt 
to solve the system. 

7. Set up but do not solve the equations which could be 
solved to find the clamped cubic spline through the 
points (1,1), (2,3), and (4,2) with S'(l) = S' 7 (4) = 0. 

[S] 

8. Write down a system of equations that could be solved 
in order to find the clamped cubic spline through the 
following data points with S’ 7 (0.1) = 0.5 and S' 7 (0.4) = 
0.1. Do not solve the system. 


X 

f(x) 

0.1 

-0.62 

0.2 

-0.28 

0.3 

0.0066 

0.4 

0.24 


9. Find the spline described in question 

(a) 2 [s] 

(b) 3 

(c) 5 [A1 

(d) 6 

(e) 7 [s] 

(f) 8 [A1 

10. ° = Use the Octave code presented in this section to 
check your answer to question 

(a) 9a [sl 

(b) 9b 

(c) 9c [A] 

11. * • Modify the Octave code presented in this section 
so that it computes the coefficients for a clamped cubic 
spline. ls] 

12. * » Use your code from question 11 to check your an- 
swer to question 

(a) 9d 

(b) 9e [S1 

(c) 9f [A) 

13. * ■ Modify the Octave code presented in this section so 
that it computes the coefficients for a cubic spline with 
mixed endpoint conditions 3c (page 187). 

14. * ■ Use your code from question 13 to find the cu- 
bic spline through (0,-9), (1,-13), and (2,-29) with 
mixed boundary conditions S'( 0) = 1 and S"( 2) = 0. 

15. * » Use your code from question 13 to find the cubic 
spline through the points (1,1), (2,3), and (4,2) with 
S"(l) = S' 77 (4) =0. 

16. Suppose n + 1 points are given ( n > 1). How many 
endpoint conditions are needed to fit the points with a 

(a) quadratic spline with first derivative matching at 
each joint? 


1 Ahlberg and Nilson, The Theory of Splines and their Applications, Elsevier, 1967. 
2 SIAM News, volume 38, number 4, May 2005. 
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(b) cubic spline with first and second derivative 
matching at each joint? 

(c) quartic spline with first, second, and third deriva- 
tive matching at each joint? 

(d) a degree k spline ( k > 1) with derivative matching 
up to degree fc — 1 at each joint? 

17. Suppose a spline S is to be fit to the four points (*i, yi), 
i = 0, 1,2,3 where xo < xi < X2 < X3. Further sup- 
pose S is to be linear on [* 0 , 2 : 1 ], quadratic on [* 1 , 0 : 2 ], 
and cubic on [* 2 , 0 : 3 ]. Finally suppose S is to have one 
continuous derivative. How many endpoint conditions 
are needed to specify the spline uniquely? Argue that 


any such endpoint conditions must be specified at *3 
and not * 0 . 

18. Let /(*) = sin* and *0 = 0, *1 = 7 r/ 4 , *2 = 7 t/ 2 , 
*3 = 37 r/ 4 , and *4 = n. 

(a) Find the cubic (clamped) spline through 

(*o , /(* 0)), {xi, /(* 1)), . . . , (*4, f(x 4)) with 

S'( 0 ) = /'( 0 ) and = /'(tt). 

(b) Approximate /( 7 r/ 3 ) by computing 5(71/3) . 

(c) Approximate f(7n/8) by computing S(7tv/8). 

(d) Calculate the absolute errors in the approxima- 
tions. 



Chapter 
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Ordinary Differential Equations 


The gate and key to the sciences is mathematics. 

-Roger Bacon (Opus Majus) 


If I were again beginning my studies, I would follow the advice of 

Plato and start with mathematics. 
-Galileo Galilei 


6.1 The Motion of a Pendulum 

A brief history 

Christiaan Huygens (1629-1695) is credited with inventing the pendulum clock in 1656, and Galileo Galilei (1564- 
1642) is credited with the first scientific study of the properties of pendula.[25, 33] In a famous letter to Guidobaldo 
del Monte in 1602, Galileo asserts that the period of a swinging pendulum (the time it takes to swing one way and 
back) is independent of the amplitude of the swing (how far it swings left and right). Del Monte famously argued 
that the physical evidence did not support the claim. [20] And he was right — it does not, and Galileo’s claim is 
actually false. The period of a pendulum varies with the amplitude of its swing (all else equal). 

Historians are generally willing to forgive Galileo for this error, though, likely due, in part, to the fact that the 
period of a pendulum is nearly constant for small amplitudes, and in part, to the fact that Galileo was the main 
figure in the scientific revolution (the birth of modern science) in the 17th century. His results regarding pendular 
motion account for only a small part of his total contribution to the sciences. The way he utilized idealized 
mathematical models of the physical world to inform his claims and experiments, a method of scientific study that 
directly contrasted with the generally held wisdom of his day, forms the basis for the scientific revolution, and as 
such was at least as important to science as any of his individual scientific discoveries. As for the pendulum, he 
put in motion the investigations which would one day (some years after his death) lead to a method of determining 
longitude at sea, an accomplishment that would change the world! With the ability to calculate their longitude, 
sailors were able to sail the seas, discover new places, and map the globe. Perhaps the biggest impact was the 
European colonization of foreign lands. 

The thought of a pendulum today most likely brings to mind the grandfather clock. While arguably less 
important than its contribution to science and navigation, the timekeeping accuracy that pendulum clocks brought 
to the world had a substantial impact on broad society. With accurate timekeeping, time-based labor, transit and 
trade schedules, announced starting times for religious or other meetings, and every other clock-based phenomenon 
we take for granted today became possible. In the 17th century, these things were novel. To put into some 
perspective just how important the clock, and therefore the pendulum became to society, consider Mumford’s 
claim: “the clock, not the steam-engine, is the key-machine of the modern industrial age.” [24] 


193 


194 


CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS 


Figure 6.1.1: Free body diagram for a pendulum. 
. Pivot 



Crumpet 32: The Pendulum Clock 


Galileo never implemented the pendulum as a timekeeping mechanism. It was around 15 years after Galileo’s 
death that the pendulum clock became a reality. Even though his first pendulum clock (1656) was more accurate 
than any other clock at the time, Huygens strived to improve upon its design. During his quest, he built a clock 
with a modified pendulum and published the classic work, Horologium Oscillatorium, where mathematical details 
of the isochronism of the cycloid were laid out for the first time, in 1673. [33, 21] 

Today, we take for granted that the cycloid is the path a falling object must follow in order for its travel to a 
given point to happen in the same time regardless of its starting position. And we also take for granted that the 
period of a simple pendulum varies with its amplitude. We have over 400 years of physical and mathematical 
hindsight that tell us so! 


The equation of motion 

Hopefully having justified an interest in the pendulum, let us turn to a modern derivation of the motion of a 
pendulum by appealing to the free body diagram, a mechanical engineering mainstay. In a free body diagram, a 
body, in this case the bob of a pendulum, is isolated from everything except the forces acting on it. Those forces are 
indicated by vectors, and Newton’s second law of motion (the acceleration of an object is directly proportional to the 
magnitude of the net force applied to the object, in the same direction as the net force, and inversely proportional 
to the mass of the object, or F = ma ) is applied. Figure 6.1.1 shows the three forces acting on a pendulum — the 
force of gravity; the tension in the rod or string holding the bob to the pivot; and a third force called drag, which 
is due to air resistance — along with the directions normal (N) and tangential (T) to the path of the pendulum. 
Technically only the bob and the three forces are part of the free body diagram. Nothing else is part of the free 
body diagram, but is added in dashed lines to help describe the motion. The length of the pendulum is taken to 
be £, and we will apply Newton’s second law in the direction tangent to the motion. That is, in the direction T. 

The speed of the bob is the product of the length of the pendulum and the angular speed, £6. The acceleration 
of the bob, the derivative of speed, is (£6) = £6. Therefore, the ma (mass times acceleration) term of Newton’s 
second law for the motion of a pendulum is mW. 

Gravity causes a constant downward force on the bob with magnitude equal to the weight of the bob, mg. The 
magnitude of this force in the T direction, however, is mg sin 9. It is worth taking a moment to make sure we have 
the correct sign. For values of 6 between 0 and 7 r, the bob is to the right of the pivot, so the force of gravity tends 
to accelerate the bob in the clockwise (negative with respect to 6) direction. Since mg sin 9 is positive for values 
of 9 between 0 and 7r, the force due to gravity is actually — mg sin 9. For values of 9 between — 7r and 0, the bob 
is to the left of the pivot, so the force of gravity tends to accelerate the bob in the counterclockwise (positive with 
respect to 6) direction. Since mg sin 9 is negative for values of 9 between —7 r and 0, the force due to gravity is again 
— mg sin 9. Similar analysis for any other angle will lead to the same conclusion. 
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The damping or drag force (air resistance) is taken as a force proportional to the speed of the bob, 1 9 , so has 
magnitude c£9. Damping forces are always taken to directly oppose the motion, so the magnitude of damping in 
the direction of T, is its entirety. It only remains to choose the right sign. Since 9 indicates the direction of motion, 
the damping force must have the opposite sign. The damping constant c is taken to be positive, and of course t is 
positive, so the damping force must be —cW. 

The tension acting on the bob is irrelevant because it is always perpendicular to the motion. The component of 
tension in the tangential direction is always zero. 

Substituting the sum of these tangential forces for F, Newton’s second law applied to the pendulum becomes 
— mg sin 9 — c£9 — 0 = m£9 or 

9 + — 9 + § sin# = 0. (6.1.1) 

m t 

Equation 6.1.1 is known as a differential equation because it is an equation that involves derivatives (or differentials). 
To be more precise, it is a second degree ordinary differential equation (o.d.e.). Second degree because the highest 
degree derivative is the second and ordinary because it involves only one independent variable (time t). 

The simplest differential equations are considered in calculus, though the term “differential equation” is rarely 
used. When first discussing the idea of antidifferentiation, the question of “What function has a derivative equal to 
... ?” inevitably comes up. For example, one might be faced with the question of what function’s derivative equals 
x? This question can also be asked, what function y satisfies the (differential) equation if = x? The answer can be 
arrived at by integrating the equation: 

J y' dx 
y 

(don’t forget the constant of integration!). 

Forces in a free body diagram 

The derivation of the equation of motion for the pendulum touches on three forces typically found in a free body 
diagram: gravity, drag, and tension. There are several other forces that may creep into a free body diagram. Most 
typical is the normal force a surface applies to a body lying upon it. In summary, here are the forces that should 
be considered when constructing a free body diagram. 

Gravity: always acts directly downward with magnitude equal to the weight of the body, mg. 

Drag: always acts directly opposite the direction of motion with a magnitude approximated in different ways 
depending on the application. This force is perhaps the most complicated to account for. It depends on 
the geometry of the body, the speed of the body, and the viscosity of the fluid relative to which the body 
moves. For slowly moving objects in low viscosity fluids, such as pendula in air, drag (air resistance) is taken 
proportional to the speed of the object. For faster moving objects in low viscosity fluids, drag is often taken 
proportional to the square of the speed of the object. In reality, drag is not exactly proportional to any 
power of speed, but rather varies in a very complicated way as the body moves through the fluid. For sake of 
tractability, though, it is almost always modeled as proportional to an appropriate power of speed. For our 
purposes, that power will simply be given. 

Tension/compression: tension is transmited through a rope, wire, chain, or other similar object by pulling on 
its ends (in opposite directions). The magnitude of the tension is constant within the object assuming, as 
we often do, that the rope, wire, or chain is massless. Tension is always directed along the rope, wire, or 
chain. The opposite of tension is compression. Rigid objects such as rods, dowels, or poles are capable of 
transmitting compressive forces by pushing on their ends. Ropes, wires, chains, and other objects that simply 
slacken when pushed are not capable of transmitting compression. 

Spring: a spring exerts a force proportional to the deflection of the spring, in the direction opposite the deflection. 

Normal: when a body lies atop a solid surface and the body is not floating away from the surface nor sinking into 
the surface, there must be a balance between the forces perpendicular (normal) to the surface. The force that 
the surface applies to a body to keep it from sinking into the surface is called the normal force and always 
acts normal (perpendicular) to and away from the surface. The magnitude of the normal force is always equal 
to the net magnitude of all other forces in the normal direction. Often the normal component of gravity is 
the only other force acting normal to the surface. 
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Friction: when a body lies in contact with a surface, friction opposes motion with a magnitude proportional to the 
normal force. The constant of proportionality is called the coefficient of friction and is denoted by /r. For any 
body/surface combination, there are two types of friction to consider — static friction and kinetic friction. A 
body at rest on a surface is capable of resisting a greater force than is the same body sliding across the same 
surface (with the same normal force). You may be familiar with this phenomenon if you’ve ever tried to slide 
an oven into or out of its usual position in a kitchen. It’s much harder to get it started moving than it is to 
keep it moving. Whether the friction is static or kinetic, it always resists motion tangential to the surface. 

Applied: a force that is applied to a body by another body, such as a person pushing a sofa or an engine accelerating 
a vehicle. 


Crumpet 33: Anti-lock braking systems 


The anti-lock braking system (ABS) of an automobile is designed to take advantage of the fact that the static 
friction between a tire and the road can stop a car more quickly than the kinetic friction between the same tire 
and the same road. A tire that is not skidding is capable of applying a greater braking (frictional) force than the 
same tire skidding. When the ABS senses that a wheel has locked (ceased rotation) while the car is still moving, 
it forces the driver to let up on the brake enough so the wheel will start spinning again, though very briefly. If 
the driver continues to hold down the brake hard enough to skid, the ABS will force the driver to let up again. 
The ABS rapidly alternates between forcing the driver to let up and allowing the driver to do as (s)he will. The 
quick alternation between making the driver let up and allowing the driver to brake hard is what causes the 
vibration or pulsing you feel when the ABS kicks in. If the ABS is working properly, a vehicle will come to a 
halt more quickly than it would have if it were allowed to skid to a stop. Also, it’s much easier to steer a car 
when it is not skidding than when it is skidding! 


Solutions of ordinary differential equations 

The solution of a differential equation is, in one way, very much like the solution of an algebraic equation but, in 
another way, entirely different. For an algebraic equation in x , for example, we say that we have a solution x = s if 
substituting s for x in the equation makes the equation true. Likewise, for a differential equation in 0, for example, 
we say that we have a solution 0 = s if substituting s for 9 in the equation makes the equation true. The difference 
is s is a number in the case of an algebraic equation while s is a function in the case of a differential equation. We 
would say that x = 2 is a solution of the algebraic equation 3x 2 — 8x + 4 = 0 since, substituting 2 for x gives 

3(2) 2 — 8(2) +4 = 0, 

a true statement. Analogously, we would say that 9 = e 2t is a solution of the differential equation 30 — 80 + 40 = 0 
since, substituting e 2t for 0 gives 

3(4e 2 *) - 8(2e 2t ) + 4(e 2t ) = 0, 

again a true statement. Notice that the derivatives 0 and 0 need to be calculated in order to complete the substi- 
tution. 

Approximate solutions of differential equations, then, must be approximations of functions. In fact, for any 
given ode, we settle for the crudest approximation, a set of points that, if our approximation is good, lie near the 
graph of an exact solution. Hence the set {(0, 1), (.25, 1.5), (.5, 2.25), (.75, 3.375), (1, 5.0625)} might qualify as an 
approximate solution of the equation 30 — 80 + 40 = 0 for t £ [0, 1]. See figure 6.1.2. The approximation is good 
for values of t near zero but not as good for values of t near 1. 

Initial Value Problems 

As with algebraic equations, differential equations may have more than one solution. We already saw that 0 = e 2t 
is a solution of 30 — 80 + 40 = 0. So are 0 = 5e 2t , 9 = —2.1e 2t , and 0 = \/7TTe 2t . In fact, 0 = ce 2t is a solution for 
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Figure 6.1.2: Approximate solution of 30 — 80 + 40 = 0. 



any constant c. The ode 30 — 80 + 40 = 0 has infinitely many solutions! It is a straightforward exercise to check. 
For 9 = ce 2t , 9 = 2ce 2t and 9 = 4 ce 2t , so 


30 - 80 + 40 


3(4ce 2t ) - 8(2ce 2t ) + 4(ce 2t ) 
12c(e 2t )-16c(e 2t )+4c(e 2t ) 
(12c — 16c + 4c)e 2t 

0 . 


Even more, 0 = ae 2t / 3 is a solution for any constant a. This solution can be verified just as the solution 0 = ce 2t 
was verified. Can you do it? Answer on page 199. Finally, 0 = ce 2t + ae 2t / 3 is also a solution for any pair of 
constants c and a ! Can you show it? Answer on page 200. It is not uncommon for a differential equation to have 
infinitely many solutions. 

Another differential equation with infinitely many solutions is 



The solutions are y = \Jt 2 + c and y = —\Jt 2 + a, valid for any constants c and a as long as y ^ 0. Complex 
solutions are valid! However, if we also require y( 0) = 1, there is only one solution! y = —\/t 2 + c is no longer a 
solution because it gives negative values of y for all values of t. And y = \Jt 2 + c is only a solution if c = 1. The 
one and only solution is y = \Jt 2 + 1. 

The requirement y{ 0) = 1 is called an initial value, or initial condition, and the pair of equations 


y 


0(0) 


t 

y 

l 


is called an initial value problem. More generally, the pair of equations 

y = f{y,t ) 
y(to) = 2/o 

forms what is knows as a first order initial value problem. 


Crumpet 34: There is exactly one solution of y = - such that y(0) = 1. 


Setting y = y/t 2 + 1, y = \ ^ 2 (2 1) = ^ ( . Hence the equation ij = i becomes 


y/t 2 + 1 y/t 2 + 1 ’ 
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an undeniably true statement. Hence y = y/t 2 + 1 is a solution of y = - Moreover j/(0) = \/0 2 + 1 = 1, so 
the particular solution y = \/t 2 + 1 satisfies the requirement that j/(0) = 1 also. Hence y = y/t 2 + 1 is one 
solution — and the only solution of the form y = %/t 2 + c or y = —\/t 2 + a. But is it £/ie only solution of any 
form? Perhaps there are other functions that satisfy the differential equation. A little bit of calculus should help 
settle the issue. The demonstration hinges on showing that y = y/t 2 + c and y = — y/t 2 + a are the only solutions 
of y = The following sequence of equations show it. Each line implies the next. 


ydy = tdt, y^O 

J ydy = C + J tdt, yjt o 

\v = C+ l -t\ y^O 
y 2 = 2C + t 2 , y + 0 
y = ±yjt 2 + 2 C, y^ 0. 


Replacing the constant 2 C with c or a does not change the fact that the term is an arbitrary constant, so 
y = y/t 2 + c and y = — y/t 2 + a are the only solutions of y = A This method of solving the differential equation 
is called separation of variables. 


Key Concepts 

Approximate solution of a differential equation: a set of points that, ideally, lie near the graph of an exact 
solution. 

Degree of a differential equation: equal to the highest order derivative appearing in the equation. 

Differential equation: an equation with derivatives (or differentials) in it. 

Free body diagram: An engineering diagram consisting of only a body and the forces acting on it. 

Initial value problem: a differential equation coupled with a required value of the solution. 

Newton’s second law of motion: the acceleration of an object is directly proportional to the magnitude of the 
net force applied to the object, in the same direction as the net force, and inversely proportional to the mass 
of the object — often summarized by the equation F = ma. This equation assumes the mass of the object is 
constant. 

Ordinary differential equation (o.d.e.): a differential equation with only one independent variable. 

Solution of a differential equation: a function that, when substituted for the dependent variable, makes the 
equation a true statement. 


Exercises 

1. State the degree of the differential equation. 

(a) y = V [A1 

(b) y" = 6x + sin x 

(c) s + s + s = 0 A - 

(d) f'+i=x 2 M 

(e) (2h + x)h' + h = Ax 

(f) rrt 2 = M 

2. Verify that the function is a solution of the differential 
equation. 


(a) 

y{t) ■ 

II 

0> 

*5 

jE 

(b) 

y(x) 

= x 3 — 26.83a: — sin a;; y" = 6a; + sin x 

(c) 

s(t) = 

= e -t / 2 sin ;s + s + s = 0^ 

(d) 

fix) 

= t + §-* > 0; /' + i = x 2 [s) 

(e) 

h(x) 

= —2a:; (2 h + x)h' + h = 4x 

(f) 

r(t) 

= y/t, t > 0; rrt 2 — — | ^ A ' 


3. Verify that the function is a solution of the initial value 
problem. 

(a) y(t) = 4e‘; y = y, y( 0) = 4 [A] 

(b) y{x) = x 3 — sinx — n 3 ; y' = 3x 2 — cos a:, j/(vr) = 0 
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(c) s(t ) =1^1 + e * ^ ; s = (1 — 2 s)t, s(0) = 1 lA| 

(d) f(x) = £ + f, x > 0; /' = -| + x 2 , /( 4) = 20 

[S] 

(e) h(x) = -2x - 1; h' = h( 0) = -1 

(f) r[t) = \Jt — 3, t > 0; rrt 2 = — |, r( 9) = 0, 

r( 9) = |. ^HINT: The solution must satisfy 
the o.d.e. and both conditions, r(9) = 0 and 

r(9) = i. 

4. Solve the differential equation. 

(a) y' = 5x 4 ^ 

(b) y' = 3xe x 

(c) y = t — sin t ^ 

(d) y = 1, t < 0 W 

(e) s' = 1 — in x 

(f) s = 3 te 4 [A) 

5. Given are an initial value problem, its exact solution, 
and an approximate solution. Comment on how well 
the approximate solution approximates the exact solu- 
tion. 

(a) y = y, y( 0) = 4; y(t) = 4e 4 ; 

{(0, 4), (.25, 5), (.5, 6.3), (.75, 7.8), (1, 9.8)} M 

(b) y' = 3x 2 — cos x, y(n) = 0; y(x) = x 3 — sin x — 7r 3 ; 
{(tt, 0), (frr, 30), (§7r, 74), (Jtt, 135), (2tt, 216)} 

(c) s = (1 — 2 s)t, s(0) = 1; s(t) = \ ^l + e _t 
{(0,1), (.5,1), (1,.75), (1.5, .5), (2, .5)} [A 1 

(d) /' = -! + /( 4) = 20; f(x) = ^ 

{(4, 20), (4.25, 23), (4.5, 26), (4.75, 30), (5, 34)} [SJ 

(e) h! = h( 0) = -1; h(x) = —2x - 1; 

{(0, -1), (.25, -1.5), (.5, -2), (.75, -2.5), (1, -3)} 

(f) rrt 2 = -|, r{ 9) = 0, r(9) = r(t) = y/t - 3; 
{(9,0), (10, .16), (11, .31), (12, .46), (13, .61)} M 

6. Draw a free body diagram for the situation. 

(a) Pendular motion ignoring air resistance (no 
damping). [A] 


(b) A block sliding down an inclined plane. ™ 

(c) A block sitting on an inclined plane (not moving). 
[S] 

(d) A block being pushed up an inclined plane. 

(e) A sofa being pushed across a level floor where the 
applied force is parallel to the floor. ^ 

(f) A sofa being pushed across a level floor where the 
applied force is not parallel to the floor. ^ 

(g) A sofa being pushed up an old, slanted hardwood 
floor. The applied force may or may not be par- 
allel to the floor. ^ 

(h) A sledder has reached the bottom of a hill (and is 
now traveling on level snow) and is coasting to a 
stop. [A] 

(i) A sledder sledding down a hill. ^ 

(j) A hockey puck sliding across an ice rink. ^ 

(k) A hockey puck sliding across ice at constant speed 
(ignoring friction). 

(l) A sky diver falling. ^ 

(m) A sky diver whose parachute just opened. ^ 

(n) A sky diver whose parachute just opened while a 
constant breeze is blowing sideways. ^ 

(o) A football originally kicked at a 40 degree angle 
just as it reaches its peak, ignoring drag. ^ 

(p) A football moving up and to the right approach- 
ing its peak, ignoring drag. 

7. Use the free body diagram from question 6 to find the 
equation of motion in the tangential direction for (6a)- 
(6k), and in the vertical direction for (61)-(6p). ' 1 ‘ ' 

8. How much easier is it to slide a sofa by pushing paral- 
lel to the floor as opposed to slightly toward the floor? 
Compare the kinetic friction for a sofa being pushed 
parallel to the floor to one being pushed at an angle of 
20 degrees from parallel. Then calculate the necessary 
applied force to overcome kinetic friction in each case. 
Assume the floor is level. ^ 


Answers 


6 = ae 2t / 3 is a solution of 36 — 89 + 40 = 0: 9 = |ae 2t / 3 and 9 = |ae 2t / 3 so 


39 — 89 + 49 = 


3 Qae 24 / 3 ) - 8 Q ae 2t / 3 ) +4 (ae 24 / 3 ) 
^(e 24 / 3 )-^a(e 24 / 3 ) + ^«(e 24 / 3 ) 


4 16 12 N 

_ a __ a+ _ a j 


e 2t/3 


0 . 
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9 = ce 2t + ae 2t / 3 is a solution of 30 — 89 + 40 = 0: 9 = 2ce 2t + |ae 2t / 3 and 9 = 4 ce 2t + | ae 2t / 3 so 
89 - 89 + 49 = 3 (^lce 2t + ^ae 2t ^ - 8 ^ce 2t + ^ae 2 ^ + 4 (ce 24 + ae 24 / 3 ) 

= 12c(e 24 ) + ^a(e 24/3 ) - 16c(e 2t ) - ^a(e 24/3 ) + 4c(e 24 ) + ^a(e 24/3 ) 

o o o 

= (12c — 16c + 4c)e 24 + f e 2t ^ 3 

\o o o J 

= 0 . 
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Figure 6.2.1: Beginning a numerical solution with the initial condition 



6.2 Taylor Methods 


The exact solution of the initial value problem 


2/(4) = 20 (6.2.1) 

is y(t) = A- + j, t > 0, as verified in exercise 3d on page 199. For the time being, let us try to forget that we know 
the exact solution, and study a method for approximating it. We will recall that we have the exact solution when 
we are ready to check how the approximation is going. The initial condition, y(4) = 20, means that the graph of 
the exact solution passes through (4,20). What a great place to start an approximate solution — at a point that is 
on the graph of the exact solution! Thus the approximation is seeded by the initial condition. There are numerous 
ways to proceed from there. Perhaps the simplest way is to use the differential equation to compute the exact slope 
(derivative) of y at (4,20): 

y ( 4 ) = 


2/(4) 


20 . 2 
— T + 4 

4 

11 . 


You might imagine a graph like that in figure 6.2.1. The graph is that of the first order Taylor polynomial expanded 
about to = 4. According to Taylor’s theorem, y(t) = 20 + ll(f — 4) + (t — 4) 2 for t near 4 and some £, depending 
on t. So, y(2) « Ti(2) = 20 + 11(2 - 4) = -2 and y{ 5) « Ti(5) = 20 + 11(5 - 4) = 31 (as long as y has two 
derivatives on an open interval containing [2,5]), and so on. As always, there is the concern of how good these 
approximations are. 

In section 4.4, two different approximations for the same number were used to estimate error in the adaptive 
methods. A similar tack may be used here. We will compare approximations given by Tf and T 2 . The differential 
equation can be used to compute y , in terms of y and t. Implicitly differentiating the differential equation gives 


yt-y 


i 2 


+ 2 1. 


But y = — | + f 2 , so we may substitute into and simplify the expression for y: 


V = 


(— f + t 2 )t — y 

t 2 

-y + t 3 -y 
t 2 

2 t_t 

t 2 t 2 
22 / 


2 1 


2 1 


— — 7T — ~n “I” 


t 2 


+ t. 
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Table 6.1: Comparing first and second order polynomial approximations 


t 

Ti(t) 

Ta(<) 

2 

-2 

11 

5 

31 

34.25 


Figure 6.2.2: A repetitive numerical calculation (truncated to 5 decimal places) 










to 

y(t 0 ) 




0 




4 

3.75 

3.5 

3.25 

20 

17.25 —is 

14.88437 

12.88504 




0 




3 

2.75 

2.5 

2.25 

2 

11.23557 10 

9.92187 

8.93323 5 

8.26406 

7.91666 

0 


' 

• 

0 














0 1 

3 

5 6 


Now we know y( 4) = + 4 = + 4 = so T 2 (<) = 20 + 11 (t — 4) + ^f(t — 4) 2 . Finally, we can compare 

values of T\ to corresponding values of T 2 , as in Table 6.1. Ti(2) and T 2 ( 2) disagree wildly, so we should assume 
that neither approximation is to be trusted. ?i(5) and T 2 ( 5) differ by only around 10%, so these approximations 
may be reasonable. To further hone the approximation of y( 2), it is possible to calculate T%(2) and again compare. 
Can you do it? Answer on page 206. 

Another way to approximate y( 2) is to take things a little more slowly. We could use the initial condition to 
approximate y( 3.75) first. Then we could use this approximation to approximate y( 3.5), which we could, in turn, 
use to approximate y{ 3.25), and so on until we ultimately use the approximation of y( 2.25) to approximate 2/(2). 
We humans may think the prospect of doing all these calculations is repugnant, but with a little Octave code, the 
burden is placed on the machine. It is the ability to understand the process well enough to write that Octave code 
that now becomes the focus. 

We know that y( 4) = 20 and we are interested in approximating z/(3.75) . Since the difference between 4 and 3.75 
is only .25, perhaps using T) will be sufficiently accurate. From before, we know the Taylor polynomial expanded 
about to = 4 is 2~i(<) = 20 + 11(< — 4), so Ti(3.75) = 20 + 11(— .25) = 17.25. Now we can use y( 3.75) = 17.25 as a 
“new” initial condition. y( 3.75) = — * 3 7 7 2 5 5 +3.75 2 = 9.4625. We can use this information to approximate the Taylor 
polynomial for y expanded about 3.75: Xi(<) « 17.25 + 9.4625(< — 3.75), and use this expansion to approximate 
2/(3. 5): 2/(3. 5) « Ti(3.5) « 17.25 + 9.4625(3.5-3.75) = 14.884375. We then can use y( 3.5) = 14.884375 as an initial 
condition, approximating the Taylor polynomial for y expanded about 3.5. Continuing in this vein leads to the 
tabular and graphical results in Figure 6.2.2. Can you reproduce these results? Details on page 206. 

The method of repeated calculation leads to y(2) « 7.91, but more importantly, illuminates an algorithm 
for approximating solutions of differential equations. Calling the initial condition (<o,2/o)> and succeeding points 
(<i, 2 /i),(< 2 , 2 / 2 ), (£ 3 , 2 / 3 ) ■ • •> the same procedure is used to calculate (< 1 , 2 / 1 ) from (to, 2 / 0 ) as is used to calculate (< 2 , 2 / 2 ) 
from (<i, 2 /i) as is used to calculate (< 3 , 2 / 3 ) from (< 2 , 2 / 2 ), and so on. It remains to capture that procedure as a 
formula of some sort. To summarize, the procedure is to use a given point, call it (<i,2/i) to 

1. calculate y(ti,yi); 

2. use the three values <j, yi , and y(ti,yi) to form T\(t) expanded about t % \ and finally 

3. set 2 / 1+1 = Ti(< i+ i), which gives a new point, (<*+ 1 , 2/i+i)- 

But Ti(<j + i) = 2 /i + y(ti,yi) ■ (ti+i — <,), so the procedure really boils down to setting 

Vi+ 1 = 2/i + y(U , Vi) ■ (ti+ 1 - </)• (6.2.2) 

The method of using formula (6.2.2) repeatedly to compute a sequence of points approximately on the solution of 
an ordinary differential equation is most often called Euler’s method. [7] It may also be referred to as the Taylor 
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method of degree 1 since it uses Taylor polynomials of degree 1 at each step. The value t i+ 1 — is called the step 
size and is often held constant, so you are likely to see Euler’s method written as 

Vi+i = Vi + h ■ y(ti,yi) (6.2.3) 

where h = t i+1 — is the constant step size. 

Euler’s Method (pseudo-code) 

As is most common, Euler’s method will be coded for a constant step size. 

Assumptions: The solution of the o.d.e. exists and is unique on the interval from to to t-\ . 

Input: Differential equation y = initial condition y(to) = Vo', numbers to and ti; number of steps N. 

Step 1: Set t = t 0 ; y = y 0 ] h = (fi - t 0 )/N 
Step 2: For j = 1 ... N do Steps 3-4: 

Step 3: Set y = y + hf(t, y) 

Step 4: Set t = to + jj{t \ — to) 

Output: Approximation y of the solution at t = t\. 

Higher Degree Taylor Methods 

Taylor methods of higher degree are rarely used in practice because they require computation of derivatives, a task 
that is not always easy or even possible. Nonetheless, it is not a huge stretch from what we have already done 
to consider higher degree methods. Rewriting the steps outlined in the enumeration that leads to 6.2.2, the third 
degree Taylor method can be summarized by 

1. calculate y{U,yi ) and y(U,yi) and y(t z ,yi)- 

2. use the three five values fj, yi, and y{ti, t/j), yfc , yi), and y ( t,, t/j ) to form T)(t) T 3 ( x) expanded about tf, and 
finally 

3. set y i+ 1 = Ti(h + i) y i+ 1 = T 3 (t i+ i), which gives a new point, (t i+ i,y i+1 ). 

Now written without all the markup, the procedure is 

1. calculate y(U,yi), y(U,yi), and y(U,yi)-, 

2. use the five values t,, yi, y(ti,yi), y(ti,yi), and y{ti,yi) to form T 3 (x) expanded about tf, and finally 

3. set j/j+i = T 3 (t i+ i), which gives a new point, (t i+1 ,y i+1 ). 

Higher degree Taylor methods require higher derivatives in step 1 and a higher degree Taylor polynomial in steps 
2 and 3. As should be expected, higher degree methods are generally more accurate than lower degree methods as 
long as the formula for y{t, y) is sufficiently smooth. To illustrate the point, we now compare approximate solutions 
of 6.2.1. 

Taylor’s Method of Degree 3 (pseudo-code) 

Taylor’s method of degree 3 will be coded for a constant step size. 

Assumptions: The solution of the o.d.e. exists and is unique on the interval from to to t-\ . 

Input: Differential equation y = f(t,y ); formulas y{t,y) and y(t,y ); initial condition y(to) = j/o! numbers 
to and t\ : number of steps N . 

Step 1: Set t = t 0 ; y = y 0 \ h = (ii - t 0 )/N 
Step 2: For j = 1 ... N do Steps 3-4: 

Step 3: Set y = y + hf(t, y) + \h 2 y{t, y) + \h r y\t, y) 

Step 4: Set t = to + ^(fi — to) 

Output: Approximation y of the solution at t = t\. 
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Table 6.2: Approximate values of y( 2) from solving 6.2.1 



h = 0.5 

error 

h = 0.25 

error 

h = 0.125 

error 

Euler’s method 
Taylor’s degree 3 method 

6.1 

9.975765 

3.9 

0.024234 

7.91666 

9.996280 

2.08333 

0.003719 

8.91911 

9.999485 

1.08088 

0.000514 


Using Octave code based on the pseudo-code presented in this section, Table 6.2 summarizes the approximate 
solution of 6.2.1 using Euler’s method and Taylor’s method of degree 3 to approximate y( 2). 

Now is a good time to say something about the error of Taylor methods. Remember a Taylor polynomial of 
degree n has an error of order n+ 1, so Euler’s method uses a Taylor polynomial with error of order 2 and Taylor’s 
degree 3 method uses a Taylor polynomial with error of order 4. But how does that translate into an error term 
for the Taylor method ? 

Though we will not answer this question completely here, we can get some idea what to expect from Table 6.2. 
From the Euler’s method row, we see the error decrease from (roughly) 3.9 to 2.08 to 1.08 as the step size is reduced 
by a factor of one half. Since 

2.08 _ 1.08 _ /lV 

~T9~ ~ 2T8 ~ 1^2/ ’ 

we conclude that Euler’s method is of first order. Considering the row on Taylor’s degree 3 method, we see the 
error decrease from about .024 to .0037 to .00051 as the step size is reduced by a factor of one half. Since 

.0037 _ .00051 _ 1 /1\ 3 

.024 ~ .0037 ~ 8 “ V2/ ’ 

we conclude that Taylor’s degree 3 method is of order 3. 

Notice the similarity between this observation and the observation we made about composite integration. In 
section 4.4, we argued that the error term for a composite integration formula had order one less than that of a 
single application of the underlying integration formula. The same thing happens here. When the truncation error 
for the underlying Taylor polynomial has order n, the corresponding o.d.e. solver has order n — 1, an order equal 
to the degree of the Taylor polynomial itself. 

Reducing a second order equation to a first order system 

Taylor’s methods and the upcoming Runge-Kutta methods are all designed to work on first order differential 
equations. However, all the equations of motion we have developed are second order differential equations. To 
resolve this disconnect, a second order o.d.e. can be reduced to a first order system. The idea is straightforward. 
Suppose y is the dependent variable in a second order o.d.e. and we have an equation of the form y" = f(y',y,x). 
We introduce an auxiliary variable n and set u = y' . Consequently, u! = y" = f{y', y, x) = f(u , y, x). We thus have 
the first order system 


u' = f(u,y,x) 
y' = u 

which can be solved using a numerical method for first order differential equations. 

For example, the equation of a pendulum (6.1.1) can be rearranged as 6 = — ^9 — f sin 9. If we substitute the 
auxiliary variable u = 6 into the equation, it becomes u = — — | sin0, and the system 

c 9 ■ a 

u = u — — sin 0 

m l 

9 = u 


is equivalent to (6.1.1). Euler’s method, for example, can be applied to this system in the following way: 


^n+l 

= u n + h ( u n 

- y sin 

\ m 

0n-\-l 

= 0 n + hu n 


^n+1 




where uo,9o, and to are taken from the initial conditions. 
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Key Concepts 

Taylor method: A method for approximating the solution of a first order o.d.e. in which a Taylor polynomial of 
some predetermined order is used at each step to compute the next. 

Euler’s method: Another name for the first order Taylor method, having formula yi+i = Vi + h ■ y(ti,yi). 


Exercises 

1. Use Euler’s method with step size h = 0.5 to approxi- 
mate 2 /( 2 ). 

(a) 

dy 
dx 
2/(!) 

(b) 

dy 
dx 

vi 1 ) 

(c) W 

y = ty 
2/(1) = 0.5 

(d) l s] 

cos(x)y' + sin(x)y = 2 cos 3 (x) sin(x) — 1 

2 /( 1 ) = 0 

(e) 

7y + 3y = 5 

2 /( 1 ) = 2 

2. Repeat exercise 1 using Taylor’s method of order 2. 

[S] [A] 

3. Repeat exercise 1 using Taylor’s method of order 3. 

[S][A] 

4. Execute two steps of Euler’s method for solving y = ty 
with y(l) = —0.5 and h = 0.25, thus approximating 
2/(1 - 5 ) - [A1 

5. Write pseudo-code for Taylor’s method of order 2. ^ 

6. Write pseudo-code for Taylor’s method of order 4. 

7. " ■ Write an Octave function that implements Euler’s 
method. ^ 

8. * ■ Write an Octave function that implements Taylor’s 
method of degree 2. 7 ' 

9. " • Write an Octave functon that implements Taylor’s 
method of degree 3. 

10. ° < Write an Octave functon that implements Taylor’s 
method of degree 4. 

11. Use your code from exercise 8 to calculate 2/(2) for the 
o.d.e. in la uisng h = 0.5, 0.25, 0.125, and 0.0625. Use 
your calculations and the fact that the exact value of 
2/(2) is 2±| — to verify that Taylor’s method of degree 
2 is an order 2 numerical method. ^ 


12. Use your code from exercise 9 to calculate 2/(2) for the 
o.d.e. in la uisng h = 0.5, 0.25, 0.125, and 0.0625. Use 
your calculations and the fact that the exact value of 
2/(2) is — to verily that Taylor’s method of degree 

3 is an order 3 numerical method. 

13. Use your code from exercise 10 to calculate 2/(2) for the 
o.d.e. in la uisng h = 0.5, 0.25, 0.125, and 0.0625. Use 
your calculations and the fact that the exact value of 
2/(2) is 2±| — to verify that Taylor’s method of degree 

4 is an order 4 numerical method. 

14. Write the equation of motion you derived in exercise 7 
on page 199 as a first order system. - 

15. Given the following parameter values and initial con- 
ditions for the referenced system, use Euler’s method 
with a step size h = 0.25 to compute s(0.5) or #(0.5) 
as appropriate. 

14a: g = 9.81 m/s 2 ; l = .31 m; #(0) = f ; 6»(0) = 0 [A1 

14b: g = 32.2 ft/s 2 ; y = .21; a = .25 rad; s(0) = 0; 
s(0) = .3 ft/s lA1 

14c: g = 32.2 ft/s 2 ; y = .21; a = .25 rad; s(0) = 0; 
s(0) = 0 [s] 

14d: g = 32.2 ft/s 2 ; y = .21; a = .25 rad; m = .19 
lbm; Fappiied. = 15 lb; s(0) = 0; s(0) = 1 ft/s 

14e: g = 9.81 m/s 2 ; y = .15; m = 35 kg; F a ppi if , d = 75 
N; s(0) = 0; s(0) = .03 m/s [A) 

14f: g = 9.81 m/s 2 ; y = .15; /3 = fg rad; m = 35 kg; 
Fappiied = 75 N; s(0) = 0; s(0) = .03 m/s [s] 

14g: g = 9.81 m/s 2 ; y = .15; a = .05 rad; P = jq rad; 
m = 35 kg; F ap pUed = 90 N; s(0) = 0; s(0) = .03 
m/s [A) 

14h: g = 32.2 ft/s 2 ; y = .01; s(0) = 0; s(0) = 30 ft/s 

[A] 

14i: g = 32.2 ft/s 2 ; y = .01; a = rad; s(0) = 0; 
s(0) = 10 ft/s [A) 

14j: g = 32.2 ft/s 2 ; y = .003; s(0) = 0; s(0) = 88 ft/s 

[A] 

14k: g = 32.2 ft/s 2 ; y = 0; s(0) = 0; s(0) = 88 ft/s 

141: g = 9.81 m/s 2 ; c = 4.5; m = 70 kg; s(0) = 10000; 
s(0) = -10 m/s [A1 

14m: g = 9.81 m/s 2 ; c = 26; m = 70 kg; s(0) = 2000; 
s(0) = —55 m/s ^ 

16. Find a formula for the angle at which a stationary block 
on an inclined plane (whose angle of inclination is in- 
creasing) will start moving. 

17. Find a formula for the angle at which a block moving 
down an inclined plane (whose angle of inclination is 
decreasing) will stop moving. 


= 3a: — 2 y 

= 1 

= 3a; 3 - y 

= 3 
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18. Undetermined Coefficients. For each differential 
equation, a solution with undetermined coefficients is 
suggested. Find values for the coefficients that make 
the suggested solution an actual solution. 

(a) ^y" + 5 y' — 8 y = “Ax 2 ; y(x) = Ax 2 + Bx + C 

(b) 2 y'" — 5 y" + Ay' + 5y = x + 1; y{x) = Ax + B 

(c) ^3 y' + 2y = Ax + 2; y(x) = Ax + B 

(d) ^y" — 14 y' + 7 y = 2x 2 + Ax — 1; y(x) = Ax 2 + 
Bx + C 


(e) [A] 2y + y = t i + l; y{t) = A + Bt + Ct 2 + Dt 3 + Et 4 

(f) x + 2x — x = 1 + te*; x(t) = Ate * + Be* + C 

(g) ^0 — 9 = e~ * sin t\ 9{t) = Ae~ * sin t + Be~* cos t 

(h) ^6 + A-9 + 9 = t cos t; 9(t) = At cos t + Bf sin t + 
C cos t + D sin t 

(i) ^a; — 2x — A5x = e 7t + 1; x(t) = Ate 7 * + Be 7 * + C 


Answers 

T 3 (2): Begin by calculating y = J^j/. 

y 


dt\t 2 
2 yt 2 — 4ty 


t 4 


+ 1 


2 (— | + t 2 ) t 2 — 4 ty 


t 4 

—2 ty + 2 1 4 — Aty 


+ 1 


~6 y 
t 3 


t 4 
+ 3 


+ 1 


SO y( 4) = ^°)+3 = 3-^ = §. Therefore, T 3 (4) = 20 + 11(4-4) + f (4-4) 2 + ^(4-4) 3 , and T 3 (2) = 9.5 
so it is close to 12 ( 2 ) = 11. We can start to believe that y( 2) is somewhere around 9.5 or 11. 


Details: 


to 

y(to ) 

y(t 0 ) 

A expanded about to 

Ti(4 0 - .25) 

4 

20 

11 

20 + 11(4 — 4) 

17.25 

3.75 

17.25 

9.4625 

17.25 + 9.4625(4 — 3.75) 

14.88437 

3.5 

14.88437 

7.99732 

14.88437 + 7.99732(4 — 3.5) 

12.88504 

3.25 

12.88504 

6.59787 

12.88504 + 6.59787(4 — 3.25) 

11.23557 

3 

11.23557 

5.25480 

11.23557 + 5.25480(4-3) 

9.92187 

2.75 

9.92187 

3.95454 

9.92187 + 3.95454(4-2.75) 

8.93323 

2.5 

8.93323 

2.67670 

8.93323 + 2.67670(4- 2.5) 

8.26406 

2.25 

2 

8.26406 

7.91666 

1.38958 

8.26406 + 1.38958(4- 2.25) 

7.91666 
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6.3 Foundations for Runge-Kutta Methods 

In section 6.2, derivatives were used to generate approximate solutions of ordinary differential equations. However, 
approximate solutions can also be generated by integrating, a much more stable numerical process. An o.d.e. of 
the form 


V = f{t,y) 

y(to) = 2 /o 


has an exact solution that can be written in terms of an integral. For any value t, and assuming existence of a 
solution over the interval from to to t, we can find a value for y(t) by integrating both sides of y = f(t , y ) with 
respect to t: 



y{t) - y(to) 


y(t) 


f(t,y)dt 
f(t,y)dt 
y{t 0 )+ [ f(t, y) dt. 


'to 


(6.3.1) 


When to and t are not close to one another, which is what we normally assume, we need to proceed in small steps 
as done in section 6.2. 

Substituting t\ for t in equation 6.3.1, 2/(U) = y(to) + ftf fi'iV) dt , so we can add J)* 1 f(t,y) dt to the known 
value y(to) to get y{t{), our first small step on the way to approximating y(t). Now substituting t\ for t 0 and t 2 
for t in equation 6.3.1, y(i 2 ) = y(ti) + f t 2 f(t,y) dt. So, we can compute y(f 2 ) from knowledge of y(t±). Similarly 
we can compute y(t^) from knowledge of y(i 2 ), y{t±) from knowledge of y{t$), and so on, eventually computing 
y(t n ) = y(t). With this in mind, we rewrite the integral representation in terms of ti and ti+ 1 instead of to and t: 


y(ti+i) = y(ti) + j f(t, y) dt. (6.3.2) 

This formula suggests that finding one approximation, y{ti+ 1 ), from the previous, y{ti), boils down to approximating 
ft ‘ +1 f(t,y) dt. That should not be too challenging at this point. About half of chapter 4 is dedicated to exactly 
this task! Every numerical integration formula is a candidate for use here, but let’s start simple. We know y(ti), 
the value of the function at the left endpoint of integration, at least approximately, so it makes sense to use a stencil 
that includes the left endpoint of integration as one of the nodes. And to make our first stab as easy as possible, 
let’s let that node be the only one! That is, let’s find an integration formula for the stencil 

0 1 

f 3 ». 


Using the method of undetermined coefficients, we calculate the left hand side of system 4.2.4 (which for us will 
only be one equation since we only have one node): 


fO rx 0 +h rxo+n 

/ Pa(x)dx= / p 0 (x)dx = / ldx = (x - x 0 )\l° o +h = h 

J a J Xn J Xn 


xo+h 


and the right hand side: 


So cio = h and we get the formula 


y^(0j/i) o fli = a 0 . 


2—0 


{•x 0 +h 


f{x)dx « hf(x 0 ). 


Consequently, //‘ +1 f(t,y ) dt ss (t i+ x — ti)f(ti,y(ti )), and equation 6.3.2 becomes 


2/(^+i) y{ti) t /(U)2/(U)) * (U+i ti) . 
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Adopting the notation yi = y(tj) and f = y from section 6.2, this formula becomes 

V %+ 1 = Vi + y(ti, Vi) ■ (t i+ 1 - U). 


Wait a minute! We’ve seen this before. This is exactly equation 6.2.2. 

The search for new methods of approximating solutions of o.d.e.s by integrating has not yielded anything new 
yet. It has to be different, however. Integration formulas include evaluation of the integrand at various points 
while Taylor methods involve evaluation of derivatives at a single point. Let’s push on. Perhaps the next simplest 
integration formula that includes the left endpoint of integration is the trapezoidal rule (see section 4.3), 

rx 0 +h i 

/ f{x)dx = - [f(x 0 ) + f{x o + h)} + 0(h 3 f" (£h)) 

Jx 0 ^ 

over the stencil 

0 1 

— + 1 — ». 


Translating the trapezoidal rule to the current notation, 

J f(t, y) dt = l ‘ +1 2 - y{) + 2 /i+i)] + 0((t i+ 1 - ti) 3 )- 

Therefore our new approximation formula is 

Vi+i =Vi + tl+1 2 [f(ti,yi) + f(ti + i,y i+1 )] . 

This equation is great except the right hand side includes y i+1 , the quantity we are trying to approximate! One 
theory is to leave it at that. The equation for yi+i is implicit in nature and that’s alright. Some root finding 
method could be used to determine yi+\ for each step of the method. While this path is not impossible, it is also 
not the simplest solution. Since the step size (fj + 1 — L) is likely to be small, perhaps using Euler’s method to 
approximate yt+\ on the right side will not cause irreparable harm to the overall approximation. Giving it a shot, 
we let y i+ i = yj + (U + i — tj) ■ f(U, yi) on the right hand side to get the new formula 

y %+ 1 = yi+ tl+± 2 [f{ti,yi] + f(t i+1 ,yi + (t i+ 1 - ti ) • f(U,yi ))] . 

Pausing for a moment to consider what we have, we might conclude the formula is getting a little unwieldy. Let’s 
see if we can tidy it up a bit. First, substituting h for ti+ 1 — ti makes it a little nicer: 

Vi+i =Vi+^ [f(ti,yi) + f(t i+1 ,yi + h ■ f{U,yi ))] . 

Second, letting k x = f(U, y t ) and k 2 = /(t i+ i, y* + h ■ y*)) = f(t i+1 , y t + h ■ Aq), we get a nice, neat, three-step 

computation: 

ki = f(U,yi) 

k 2 = f(t i+1 ,yi + hki) 

Vi + 1 = Vi + ^(fci + k 2 ). (6.3.3) 

But before getting too carried away with the clean formulation, it would be nice to have some evidence that this 
“advanced” method gives a reasonable approximation of the solution to an o.d.e. as expected. Let’s have Octave 
compute approximate solutions of o.d.e. 6.2.1 using both Euler’s method and this method based on the trapezoidal 
rule, and compare them to the exact solution, y(t) = \ The following code snippet, while specific to this one 

task can be generalized to find approximate solutions of other o.d.e.s as well. 
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O.D.E. solver test code 

t=4; 

h=-l/4; 

f=inline("-y/t+t~2") ; 
exact=inline("t~3/4+16/t") ; 
euler=20 ; 
trap=20 ; 


disp(’ Euler Trapezoid Exact Euler err Trap err’) 

disp(’ ’) 


for i=l : 8 

euler=euler+h*f (t , euler) ; 

kl=f (t ,trap) ; 

k2=f (t+h,trap+h*kl) ; 

trap=trap+h/2* (kl+k2) ; 

t=t+h; 

x=exact (t) ; 

sprint f ( ,0 / 0 12 . 5g“/ 0 12. 5g“/ 0 12 . 5g“/ 0 12 . 5g“/ 0 12 . 5g’ , euler , trap ,x , abs(euler-x) , abs (trap-x) ) 
end“/ 0 f or 

This test code may be downloaded at the companion website (rungeKuttaDemo ,m). The only part of this code that 
may appear unfamiliar to you at this point is the sprintf () command. The first argument, 

’ 7„12 . 5g“/,12 . 5g“/,12 . 5g“/ 0 12 . 5g“/,12 . 5g > , 

is the formatting string. This particular string means to string together 5 floating point numbers using 12 spaces 
each and displaying 5 significant digits. In the sprintf command, °/ 0 12.5g means “general” formatting of a floating 
point number with 12 spaces and 5 significant figures. The computer will decide whether to use scientific notation 
in the output. Since it is repeated 5 times, this particular command will format five such floating point values. 
The rest of the arguments are the five numbers to print. The command sprintf should not be read as “sprint-eff” 
but rather “ess-print-eff” or “string print formatted”. The s is for string and the f is for formatted. If you’re 
thinking this command seems a bit arcane, you’re right. This type of print formatting command originated in the 
C programming language during the 1970s! 1 The output of running this Octave code is 



Euler 

Trapezoid 

Exact 

Euler err 

Trap err 

ans = 

17.25 

17.442 

17.45 

0.20026 

0.0080729 

ans = 

14.884 

15.273 

15.29 

0.4058 

0.016741 

ans = 

12.885 

13.479 

13.505 

0.62006 

0.026142 

ans = 

11.236 

12.047 

12.083 

0.84776 

0.036458 

ans = 

9.9219 

10.969 

11.017 

1.0955 

0.04794 

ans = 

8.9332 

10.245 

10.306 

1.373 

0.060938 

ans = 

8.2641 

9.8828 

9.9588 

1.6947 

0.075955 

ans = 

7.9167 

9.9062 

10 

2.0833 

0.09375 


Our method based on the trapezoidal rule, which we will call trapezoidal-ode for now, seems to do a better job 
of approximating the solution of this o.d.e. than does Euler’s method. The last two columns contain the absolute 
errors for each approximation. The errors in trapeziodal-ode are roughly 0.01 to 0.1 while the errors for Euler’s 
method are roughly 0.2 to 2. All of the errors in trapezoidal-ode are smaller than all the errors in Euler’s method. 
Of course trapezoidal-ode requires two evaluations of / per step, so it better deliver better results for the extra 
work if it is to be useful at all. 

Buoyed by this success, perhaps it is worth investing some time in other integration formulas, like Simpson’s 
rule, for example. Recall from section 4.3, Simpson’s rule states 

r*Xo+2h 7 

/ f{x)dx = - [/(x 0 ) + 4/(x 0 + h) + f(x o + 2 h)} + 0(h 5 / {4) (&)), 

J X 0 & 

1 See https://en.wikipedia.org/wiki/Printf_format_string for some details. 
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which in the notation of this section we might write as 

f ti+1 ft 

J f(t, y)dt= - [f(U, yi) + 4 f(t i+1/2 , y i+ 1/ 2 ) + f(t i+ 1 ,y i+ i)] , 

ignoring the error term, and using the notation t i+ 1( / 2 to mean t t + ^ ft and yi+1/2 to mean y(ti + \h). So an o.d.e. 
solver based on Simpson’s rule might look like 


Vi+i — Vi + g yi) + 4/(tj +1 / 2 , yi+1/2) + 2/i+i)] • 

Again, this is an implicit formula. Again, we can use Euler’s method to estimate j/i+i, and, in fact, we can use 
Euler’s method to estimate yi+1/2 too! Since t i+1 / 2 is closer to tj than is U+i, we estimate yi+1/2 first. That is, we 
replace yi+1/2 by yi + § f{U,yi )■ Using a multiple-step calculation as before, that gives us 


fci 

k 2 


„ / ft ft, 

/ I U + ^,yi + 


so far. This takes care of the first two terms in brackets. Now we estimate yi+i by approximating f(ti+i,yi+i). 
But we now have an estimate of / at ti + and ti + | is closer to ti + 1 than is tj. So, even though we could use 
yi + hf(ti,yi) = yi + hk\ to approximate yi+i (as done before), we might expect yi + hk 2 to be a better estimate. 
With this hope in hand, we complete the method by calculating as follows: 


h = 
k 2 = 
k 3 = 

Vi+i = 

For now, we will refer to this method as Simpson’s-ode. 

Before trying to assess whether this new method is better than the previous ones, let’s derive a couple more, 
and compare them all together. The formula 

rXo+3 h 07 

/ f{x)dx = — [f(x 0 + h) + /( x 0 + 2 ft,)] + 0(ft 3 /"(^)) 

Jxq ^ 

(an open Newton-Cotes formula from section 4.3) leads to the method 


f{U,yi ) 

/ft ft, 

/ ( U +2 >2/* + 2 kl 

f(ti+i,yi + hk 2 ) 


Vi + tt [ki + 4fc 2 + ^3] . 


ki 

k 2 

k-i 

Vi+i 


f(U, yi) 

r( h ft , 

J I ti + ~^,yi + gfcl 


. . 2 ft 2 ft 

J [ti + —,Vi+ yft’2 


Vi + 2 1*2 + fc 3] • 


Can you fill in the steps to derive this method? Answer on page 213. We will call this method open-ode. Finally, 
we use the stencil 


0 

-f- 


1 



3 


-> 


to derive yet another integration formula. This is not an open Newton-Cotes formula nor is it a closed Newton-Cotes 
formula. It is not one that was covered in section 4.3. Perhaps it might be called a “clopen” (half closed and half 
open) Newton-Cotes formula. Can you derive the corresponding integration method? Details on page 214. The 
result is 


xo+3/i 


f(x)dx « ^ [f(x 0) + 3/(a:o + 2 ft)] , 
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disregarding the error term. This leads to the o.d.e. solver 

ki 

k 2 

h 

Vi+i 

We will call this method clopen-ode. Notice two things. First, even though k 2 is not used in the final line, it is still 
computed since it is used to compute fc 3 . Second, the calculations of k\, k 2 , and fc 3 are identical to those in the 
open-ode method. The only difference is how the kj are combined. The integration methods combine the values of 
the function at the nodes differently. This idea of using the same kj for different purposes will come up again!. 

So now we have three new methods to test out — one based on Simpson’s rule (Simpson’s-ode) , one based on an 
open Newton-Cotes formula (open-ode), and a third based on a “clopen” Newton-Cotes formula (clopen-ode). Can 
you write test code for comparing the three new formulas (similar to the code used to compare Euler’s method with 
trapezoidal-ode)? Answer on page 215. Results are summarized in the following Octave output: 


„fh h, 

— J [U + -,yi + -k 1 


. 2 h 2 h 

J [U + -yp, 2/i + ~3 2 


Vi + j [h + 3fc 3 ] . 




Simpsons 

Open 

Clopen 

Simp err 

Open err 

Clop err 

ans 

= 

17.44806 

17.44999 

17.45022 

0.00220 

0.00028 

0.00004 

ans 

= 

15.28557 

15.28953 

15.29008 

0.00461 

0.00065 

0.00010 

ans 

= 

13.49781 

13.50395 

13.50494 

0.00730 

0.00116 

0.00017 

ans 

= 

12.07297 

12.08146 

12.08307 

0.01036 

0.00187 

0.00027 

ans 

= 

11.00347 

11.01450 

11.01700 

0.01393 

0.00290 

0.00040 

ans 

= 

10.28804 

10.30185 

10.30566 

0.01821 

0 . 00440 

0.00059 

ans 

= 

9.93523 

9.95208 

9.95789 

0.02354 

0.00669 

0.00088 

ans 

= 

9.96952 

9.98969 

9.99866 

0.03048 

0.01031 

0.00134 


Simpson’s-ode does the poorest job of finding an approximate solution and clopen-ode does the best. But why? 

We’ve done a pretty thorough job of sweeping error analysis under the rug up until now. The bulk of that 
investigation will happen in the next section, but we can do a quick analysis here. From section 4.3, we know 
that the trapezoidal rule and the open Newton-Cotes formula we used here both have error terms of 0(h 3 ), while 
Simpson’s rule has error term 0(h 3 ). The integration methods based on the stencils 

0 1 

— t ] — > 


0 

f- 


1 

• — 


2 




(which led to Euler’s method and the clopen method) have yet undetermined error terms. Can you show that 
their error terms are 0(h 2 ) and 0{h 4 ), respectively? Answer on page 215. Based on the error terms of the 
underlying integration methods, we should expect these o.d.e. solvers to be, in order from least accurate to most 
accurate, Euler’s method (based on a 0{h 2 ) integration formula), open-ode (based on a 0(h 3 ) integration formula), 
clopen-ode (based on a 0(h 4 ) integration formula), and Simpson’s-ode (based on a 0(h 5 ) integration formula); with 
trapezoidal-ode to be on par with open-ode. Table 6.3 shows the errors in calculating y( 2) for 6.2.1 for the five 
methods of this section using various values of h. Since the value of h in each row is half that of the previous row, 
we would expect the ratio of the errors in consecutive rows to be approximately (^) where the rate of convergence 
for the method is 0(h e ). For Euler’s method, dividing the error in row 3 by that of row 2, we get (|) f ft: ft ^ 

and dividing the error in row 6 by that in row 5, we get ft • 0 1 7 3 0 9 1 9 3 ft for example. This evidence suggests that 
£ = 1 for Euler’s method, and therefore, Euler’s method has an O(h) convergence. Repeating the same calculation 
for the other methods yields Table 6.4. 

With the exception of Simpson’s-ode, Table 6.4 suggests that o.d.e. solvers have an error term of one less degree 
than their underlying (single step) integration formula. In section 4.4 we noted that composite integration formulas 
also have error terms of one less degree than their corresponding single-step integration formulas (and we made a 
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Table 6.3: A comparison of absolute errors for five o.d.e. solvers 
h Euler’s Trap-ode Open-ode Clopen-ode Simpson’s-ode 


2.0833 

0.09375 

0.010311 

0.0013444 

0.030482 

1.0809 

0.023437 

0.0025929 

0.00017446 

0.0077168 

0.55114 

0.0058594 

0.00064977 

2.2207(10)" 5 

0.0019412 

0.27837 

0.0014648 

0.00016261 

2.8008(10)- 6 

0.00048679 

0.1399 

0.00036621 

4.0672(10)- 5 

3.5166(10)- 7 

0.00012188 

0.07013 

9.1553(10)- 5 

1.017(10)- 5 

4.4055(10)- 8 

3.0494(10)- ; 


Table 6.4: The error terms of five o.d.e solvers and their underlying integration methods 
Euler’s Trap-ode Open-ode Clopen-ode Simpson’s-ode 

Integration method 0(h 2 ) 0(h 3 ) 0(h 3 ) 0(/i 4 ) 0(h 5 ) 

O.D.E. solver 0(h) 0(h 2 ) 0(h 2 ) 0(h 3 ) 0(h 2 ) 


similar observation about Taylor methods in section 6.2). There is reason to believe in this parallel as the methods 
proposed in this section are essentially composite integration techniques. So, it should be a little troubling that 
Simpson’s-ode does not fit the pattern. A deeper exploration of the error term is needed to explain this anomaly. 


Exercises 


1. Derive an o.d.e. solver based on the stencil and corresponding integration formula. 

0 12 3 

(a) i f gr- ff U I ^w-aSk,^ 

J (/(so) + 3/ (x 0 + |/i) ) + 0(h A ) 

0 12 

(b) w [ * ] » 

hf (xo+h^+Oih 3 ) 


(c) 


M 


Xo + -h ) - 


/(so)) + 0(h 3 


0 12 3 

(d) 

hf (x 0 + ^hj + 0(h 2 ) 


(e) 


(f) 


(g) 



0 


1 2 3 

' * * 

^ (3/ (xo + ^hj + f(x 0 + h)) + 0(h 4 ) 



hf (x 0 + |h) + 0(h 2 ) 


^ (3/ (x 0 + ~ 4 / (*0 + ^hj + 3 hf (x 0 + | ft) ) + 0(h 5 ) 
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GO 


(3/ (xo + + f(x 0 + h)j + 0(h 4 


(i) 


1 3= 

0 r V3 


1 + 


v/3 2 


I ( J (*» + ^r' 1 ) + ' ( x ° + ^Jr h 1 1 + 0( '“' ) 


(j) 


o r 1 ~yl 1 1 + vs ,2 


h ( r J , V5-V3 

is ( 5/ r + -2^- 


hj + 8/ ^£0 + ^hj 5/ ^*0 + 


V5 + V3, 

•2V5 


h + 0{h 7 ) 


2. Conduct a numerical experiment on test o.d.e. 6.2.1 to determine the rate of convergence of the method derived in 
question 1. Based on the error term of the integration formula, is the rate of convergence of the o.d.e. solver as 
expected? 

3. " • Write an Octave function that implements Euler’s method. U 

4. * • Write an Octave function that implements trapezoidal-ode. 

5. ^ ■ Write an Octave function that implements clopen-ode. 

6. " ■ Write an Octave function that implements the solver you derived in exercise lb. This is called the midpoint method 

or the modified Euler method. It is based on the midpoint rule for integration. ' 

7. ’ • Write an Octave function that implements the solver you derived in exercise la. This is called Ralston’s method. 

[A] 

8. " o Use your code from exercise 3 to compute y(2) for the o.d.e. in exercise 1 on page 205 using step size h = 0.05. 

[S] [A] 

9. " o Use your code from exercise 4 to compute y( 2) for the o.d.e. in exercise 1 on page 205 using step size h = 0.05. 

[S] [A] 

10. 0 < Use your code from exercise 5 to compute y(2) for the o.d.e. in exercise 1 on page 205 using step size h = 0.05. 

[S] [A] 

11. 0 a Use your code from exercise 6 to compute y(2) for the o.d.e. in exercise 1 on page 205 using step size h = 0.05. 

[S] [A] 

12. * o Use your code from exercise 7 to compute y( 2) for the o.d.e. in exercise 1 on page 205 using step size h = 0.05. 

[S] [A] 


Answers 

Filling in the gaps: Beginning with the integration formula 

/»fCo+3 h 


!'XQ-\-6ri ol 

/ f(x)dx = — [/( x 0 + h) + f(x 0 + 2 h)] + 0(h 3 f"(£ h )), 

J x 0 


we “shrink” the interval of integration to [xq,Xq + s] by making the substitution s = 3 h: 

rx 0 +s 


f{x)dx = 


' X 0 


1 2 

f(x 0 + -s) + f(x 0 + -s) 


0(s 3 f"^ k )). 


With the integration formula rephrased in terms of step size s, the o.d.e. solving method is 
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where we revert to using ft for step size. We then use Euler’s method to estimate y i+ 1/3 and y i+ 2/3, starting 
with yi+i/ 3. That is, we replace yi+1/3 by yi + } ^.f(ti : yi). Then we estimate y*+ 2/3- Using a multiple-step 
calculation as before, that gives us 


fci 

k 2 


, / h h, 

J [U + - ,yi + -ki 


taking care of the first term in brackets. It remains to estimate f(t i+ 2 /3> Vi + 2/3)- But we now have an estimate 
of / (the derivative of y) at fj + |, and U + | is closer to ti+2/3 than is f,;. So, we approximate y,; + 2/3 by 
yi + lhk 2 : 

ki 

k 2 

k 3 
Vi + 1 

0 1 

Clopen Newton-Cotes: f * 

For this stencil, a = xo, b = Xq + 3 h, and 6ft = ih, i = 0, 1,2. Therefore, we will have a system of three 
equations in the three unknowns. First, the left-hand sides: 


= f(U,Vi) 

,{ h h, 
= f[U+-,yi+-k 1 


2 h 2ft, . 

fiu + + 

Vi + 7; [k 2 + k 3 \ . 


po pxo+3 h pxo ~\ -cm 

/ po(x)dx = / po(x)dx = / ldx = (x — a:o)| 

J a J xq J xq 

rb pXQ+3h nXo~\~3 h ^ 

/ pi(x)dx = / pi(x)dx = / (x — Xo)dx = -(x — Xq) 2 

J a J Xq J Xq ^ 

rb pXQ+3 h nxo~\~3 h ^ 

/ p 2 {x)dx = / p 2 {x)dx = / (x - x 0 ) 2 dx = -(£-£0) 

J xr\ J xn ^ 


^x 0 +3/i 


iXo+3/i 

1^0 


= 3ft, 


xo+3/i 


#0+3/1 


= 2 ^ 


= 9 ft 3 


Now putting them together with the right-hand sides (and swapping sides): 

2 

^^(6ftft)°ai = oo + ai + a2 = 3 ft 
i= 0 

2 ^ 9 

^ (6ft 61) + = ftoi + 2 ha 2 = -ft 2 

i=0 

2 

^(ftftft) 2 ^ = ft 2 ai + 4ft 2 a 2 = 9ft 3 


*=0 

This system is small enough to solve by hand (without the use of a computer algebra system) : 

ft 2 cii +4ft 2 d2 = 9ft 3 

— (ft 2 ai +2ft 2 ci2 = |ft 3 ) => 02 = | ft. 

2ft 2 02 = | ft 3 

Substituting 02 = |ft into ftoi + 2 ha 2 = | ft 2 , we can solve for ai: 

9 9 

ftoi + 2 ft • -ft = -ft 2 

9 9 

ftoi -I — ft 2 = -ft 2 => ai = 0. 

2 2 

ftai = 0 
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Substituting ai = 0 and «2 = | h into ao + a\ + ci 2 = 3 ft, we can solve for ap: 


ctQ + 0 + — ft 


ft = 3 ft 



Therefore, JU_ 0 dj/(x o + Oih) = f ft ■ /(xo) + 0 • f(x o + ft) + |ft • /(xq + 2ft) and the integration formula is 


'Xq+3 h 


x 0 


f(x)dx « — [/(x 0 ) + 3/(x 0 + 2ft.)] . 


Test code: Comparing Simpson’s, open, and clopen methods: 


t=4; 
h=-l/4 ; 

f=inline("-y/t+t~2") ; 

exact=inline("t~3/4+16/t") ; 

simp=20; 

open=20; 

clop=20; 

disp(’ Simpsons Open Clopen Simp err Open err Clop err’) 

disp(’ ’) 

for i=l : 8 

klsimp=f (t , simp) ; 

klopen=f (t , open) ; 

klclop=f (t , clop) ; 

k2simp=f (t+h/2 , simp+h/2*klsimp) ; 

k2open=f (t+h/3 , open+h/3*klopen) ; 

k2clop=f (t+h/3 , clop+h/3*klclop) ; 

k3simp=f (t+h , simp+h*k2simp) ; 

k3open=f (t+2*h/3 , open+2*h/3*k2open) ; 

k3clop=f (t+2*h/3 , clop+2*h/3*k2clop) ; 

simp=simp+h/6* (klsimp+4*k2simp+k3simp) ; 

open=open+h/2* (k2open+k3open) ; 

clop=clop+h/4* (klclop+3*k3clop) ; 

t=t+h; 

x=exact (t) ; 

sierr=abs (simp-x) ; 

operr=abs (open-x) ; 

clerr=abs (clop-x) ; 

sprintf ( ’ ”/ 0 12 . 5g“/ 0 12 . 5g“/ 0 12. 5g“/ 0 12 . 5g“/ 0 12 . 5g“/ 0 12 . 5g’ , simp , open, clop , sierr ,operr , clerr) 


end%f or 


This test code may be downloaded at the companion website (rungeKuttaDemo2.m). 
Error terms: The error term for 



is derived in the section 4.3 solutions. See page 273. The error term for 



is derived similarly. We are given that the error is 0(ft 2 ), so we can skip the discovery. Expanding f{x) in a 


Taylor polynomial with error term, 


/(x) = f(x q) + (x - X 0 )f(£x)- 


216 


CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS 


So 


j-xo+h rx 0 -t-n 

/ f(x)dx-hf(x o) = / (f(xo) + (x-xo)f'(€ x ))dx-hf(x 0 ) 

J xn J x n 


fXo+h 


rx 0 +h 

x f( x o)\T 0 +h + / (x-x 0 )f(£,x)dx - hf(x 0 ) 

J Xo 

r'XQ+h 

hf(x o) + (x — x 0 )f(£ x )dx - hf(x 0 ) 

J Xo 


r xo +h 


(x - x 0 )f(£ x )dx. 


By the weighted mean value theorem, there exists c £ ( Xq,Xq + h) such that (x — 


f(c) f*° +h ( x - x 0 )dx = \f(c)h 2 . Hence 


X 0 


J x 0 


rxo+h i 

/ f{x)dx - hf(x 0 ) = -f(c)h 2 < Mh 2 f(£ h ) 

J Xo ^ 


where we have replaced c by £/j. 


xo )f(€x)dx = 
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6.4 Error Analysis 

Section 6.3 ended with the mysterious (and unsettling?) observation that Simpson’s-ode did not live up to expec- 
tations. Based on other o.d.e. solvers, we would expect the rate of convergence of Simpson’s-ode to be 0(h A ) since 
Simpson’s rule, on which Simpson’s-ode is based, has local truncation error 0(h 5 ). 

The explanation is rooted in the fact that we are solving an o.d.e. of the form y = f(t, y), in which the derivative 
is a function of two variables, t and y. To understand the error analysis, heavy use of partial derivatives and the 
chain rule are required. As ever, we consult Taylor’s theorem and write 

y{t 0 + h)= y(t 0 ) + hy{t 0 ) + ^h 2 y(t 0 ) + ^ h 3 V(t 0 ) H . 

Each derivative of y can be replaced by some function of / and its partial derivatives, starting with y, which is 
given by the o.d.e. we are trying to solve. 

y = f(t,y ) 

y = ^y = ^j.f(^y) = ft(t,y) + fy(t,y)y = ft(t,y) + fy(t,y)- f{t,y) 


Eliminating the explicit use of arguments t and y, 

V = f 

y = ft + fyf 

y = ft.t + ftyf + {fyt + fyyf)f + fyift + fyf) 
= ft.t + 2ftyf + fyyf 2 + ftfy + fyf 


so y(t 0 + h) = y(t 0 ) + hy(t 0 ) + \h 2 y{t 0 ) + \h 3 V{t 0 ) H in terms of / is 

y{t o + h) = y(to ) + hf + - h 2 (f t + fyf) + -h 3 (ftt + 2ft y f + fyyf 2 + ftfy + fyf ) + • • • ; 

and as an o.d.e. solver (replacing y(to) by y* and y(to + h) by yi+i), 

Vi + l = Vi + hf + - h~(f t + fyf) + -h 3 (f tt + 2 ftyf + fyyf 2 + ftfy + fyf ) H • (6.4.1) 

Rewriting high degree Taylor polynomials in terms of / quickly becomes complicated. We will focus on analysis 
requiring only y, y, and y . 

The o.d.e. solvers of section 6.3 have the form 

ki = f(U,yi ) 

k 2 = f(U + fah, yt + P 2 hki) 

k3 = f(U + /?3 h, yt + /3 3 hk 2 ) 

k s f (ft T T (3 s hk s — i) 

y»+i = yt + h[aiki + a 2 k 2 + a 3 k 3 -\ b a s fc s ] . (6.4.2) 

We did not actually see any o.d.e. solvers with s > 3 in section 6.3, but the process we followed would clearly 
require it should there be more than three nodes in the underlying integration formula. 

The difference between y(to + h) from (6.4.1) and Vi+i from (6.4.2) is the local truncation error of the o.d.e. 
solver (the error in taking a single step). In order to write this truncation error in the form 0(h l ), though, we need 
to expand each kj in its Taylor polynomial. Taylor’s theorem in two variables is needed. 
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Theorem 8. Suppose f(t,y) and all its partial derivatives of order n + 1 and lower are continuous on the rectangle 
D = {( t,y ) : a < t < b,c < y < d}, and let (to,yo) G D. Then for every ( t,y ) € D, there exist f £ ( a,b ) and 
p £ (c, d) such that 

f(t,y ) = f(to,yo) + [(t-t 0 )- f t (t 0 ,yo) + (y-yo)- f y (to,yo)\ 

+ ^ [(t - t 0 ) 2 ftt{t 0 , y 0 ) + 2 (t - to)(y - yo) ■ fty(to , 2/o ) + (y - yoff vy (to, 2/o)] 


1 

n! 


E 

1=0 

1 


(t-to) n " J '(y-yo) J ' 


0"/ 


dt n ~i dyi 


(to, 2/o ) 


(n + 1)! 


n+1 

E 

1=0 


n+1 

J 


(/-l 0 ) n+1 ^V-Vo)* 


As with Taylor’s theorem (of one variable), the first n + 1 terms form the Taylor polynomial and the last term is 
the remainder term. 

To illustrate, we let f(t , y) = — | + f 2 and compute its second Taylor polynomial with remainder term expanded 
about (to, yo ) = (1; !)• For this, we will need all partial derivatives of / up to and including order 3. 


It follows that 


* = ¥ + 2t 

fy = — t 
fu = -2J + 2 

fty = fyt = ^2 

fyy = 0 

r _ f. y 

Jttt — 0^4 

ft.ty = ftyt = fytt = ~ f3 

ftyy = fyty = fyyt = 0 

Jyyy = 0- 


/(1,1) = 0 

+ (1,1) = 3 

fy( 1,1) = “I 

/«( 1,1) = 0 

+(1,1) = 1 

+ (1,1) = 0 

fttt(f,p) = 6(p; 

2 

fttyif, p) = ^3 

ftyy(£, P) = 0 

fyyy(£, A 4 ) = 0- 

Therefore, the second Taylor polynomial for f(t , y) is 

T 2 (t,y) = /(l, 1) + [(t — 1) • /t(l, 1) + (y ~ 1) • /y(l> 1)] 

+ 2 [+ _ l) 2 /tt(l> 1) + 2(t ^ l)(y - 1) • /ty(l, 1) + (2/ _ 1) 2 + (1, 1)] 
= 0 + 3(f — 1) — (y — 1) + 0(i - l) 2 + (t - 1 )(y - 1) + 0(2/ - l) 2 
= 3(t- 1) - (y- 1) + (t- l)(y- 1) 
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with remainder term 

# 2 (t, y) = i [(£ - l) 3 / ttt (£, y) + 3 (t - 1 ) 2 (y - l)/tty(C, + 30 - l)(y - 1 ) 2 ft yv (Z, y) + (y ~ 1 ) 3 fyyy(f, aO] 

= g [(* - l) 3 - 6^ - 3(t - 1) 2 (» - 1) - A + 30-l)( y -l) 2 .0 + ( y -l) 3 .0 
= 0-l) 3 ^-0-l) 2 (y-l)i 

More generally, suppose we are interested in Taylor polynomial expansions of expressions like /(fj + pjh,yi + 
Pjhkj- 1 ), as we have in our o.d.e. solvers. Expanding about (ti ,yf), we let to = ti , yo = yi, t = U + /3jh , and 
y = yi + Pjhkj-i. Thus t — to = /3jh and y — yo = 8jhkj-i, and the second Taylor polynomial without explicit 
listing of the arguments ti and yi on the right-hand side is 

f(ti + Pjh, yi + Pjhkj- 1) = / + hfij [f t + kj-\f y ] + -/i 2 /? 2 [ftt + 2kj-if ty + tf_ l fyy\ 


with remainder term 0(h 3 ). 

In particular, when we set j = 1, Bj = fi-\ =0, we get 

ki = f(ti,yi) = f. 


When we set j = 2, 


&2 = / ( U + fah, yi + fohki) 

= f + h(3 2 [ft + f f y } + -h 2 pi [ f u + 2/ f ty + f 2 f yy ) + 0(h 3 ). 

The calculation of £3 is a little bit messier since it involves k%. Before diving in headlong, though, consider what 
we will do with ko first. After computing fci, k 2 , and /C 3 , we will substitute each into the formula 

Vi + 1 = Ui + h [a\k\ + a 2 k 2 + a 3 k 3 } (6.4.3) 

and subtract the result from (6.4.1). For purposes of this discussion, we seek a method with local truncation error 
0(hf). Therefore, we need only retain constant terms and terms containing a factor of h 3 , h 2 , or h in equation 
(6.4.3). Terms with higher powers of h are irrelevant. They will be assumed (or should I say consumed?) by the 
0(h 4 ). Since the sum a\k\ + a 2 k 2 + a 3 k 3 is multiplied by h , we need only retain terms with factors of up to h 2 in 
hi, k 2 , and k 3 . Taking a look at the expansion of k 3 : 

k 3 = f (U + P 3 h, yi + P 3 hk 2 ) 

= f + hfi 3 [ft + k 2 f y ] + -h 2 /Bl [ftt + 2k 2 f ty + k\f yy ] 

we see only the term ^h 2 /3 2 ■ k%f contains k 2 , and it already has a factor of h 2 . Consequently, we only need to 
include the constant term of k 2 . The rest of the terms of fcf become part of the 0(h 4 ). That’s not so bad! 

k 2 = f 2 + 0{h). 

Similarly, when we substitute expressions for k 2 into k 3 , we will be careful to avoid any terms that would give a 
factor of h to any power greater than 2 : 

k 3 = f + h/3 3 [ft + (f + hp 2 [f t + ffy})f y ] 

+ 2^ 2 ^3 [ ftt + 2(f) fty + ( / 2 ) fyy\ + 0(h 3 ) 

= f + hMt + hfaffy + h 2 fo(} 3 (ftfy + ff 2 ) 

+ 2^ 2 ^3 [ftt + 2 f fty + f 2 fyy\ + 0(h 3 ). 


After all that detailed computation, now is a good time to lean back and take a look at what we have so far. 
We have expanded all the terms of (6.4.2) for s = 3 and are ready to compare the result to the Taylor expansion 
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of the o.d.e. in (6.4.1). The difference of the two is the local truncation error, so we will be interested in the least 
power of h that remains after subtraction. Copying the two equations here for convenience, we are subtracting 

Di+ 1 = Vi + hf + -fr 2 (/t + fvf) + 0 h 3 (ftt + %ftyf + fyyf~ + ftfy + fyf ) + 0 (h 4 ) 


from 

Ui + i = Vi + h [aiki + a 2 k 2 + a 3 k 3 ] 

= yi + haiki + /ict 2 fc 2 + ha 3 k 3 
= Ui + haif 

+ha 2 + h/3 2 [f t + f f y \ + -h 2 Pl [f tt + 2 / f ty + f 2 f vy \ + 

+ ha 3 (7 + hfoft + hfoffy + h 2 p 2 p 3 (f t fy + ffl) + l -h 2 pi [ftt + 2 ffty + f fyy\ + O . 

The constant term (term containing no factor of h ) for each equation is simply yi, so no constant will remain after 
subtraction. The difference of the terms involving h is hf — {haif + ha 2 f + ha 3 f) = hf( 1 — (au + a 2 + a)), so if 
there is to be no h left in the difference, we must have 


a i + a 2 + a 3 — 1 - 

The difference of the terms involving h 2 f t is \h 2 f t - (ft 2 a 2 /3 2 /t + h 2 a 3 /3 3 ft ) = h 2 f t {\ - (a 2l d 2 + a 3 /3 3 )), so if there 
is to be no h 2 f t left in the difference, we must have 


a 2 /3 2 + a 3 p 3 — 

Similarly, we consider the differences of the rest of the terms to get the following conditions on the aj and fy. 


term leads to condition 


h 2 fyf 

a-2 + a 3 p 3 

h 3 ftt 

a 2 Pl + a 3 fi% 

h 3 fty f 

a 2/31 + a 3 fil 

h 3 fyyf 2 

a 2 Pi + a 3 fil 

h 3 ftfy 

a 3 P2/3 3 = 

h 3 f;f 

a 3 fl2/3 3 = 


We have considered all 8 different terms, but have only arrived at 4 distinct conditions: 


ou + ct 2 + Q!3 = 1 

a 2 /3 2 + a 3 j3 3 = — 

a 2 Pl + a 3 fi 2 = - 

a 3 P2/3 3 = -■ 

6 


(6.4.4) 


Since we have 5 variables and only 4 conditions, we should think that there are multiple o.d.e. solvers of the form 
(6.4.2) with s = 3 and local truncation error 0(h A ). 

Evidence from section 6.3 suggests that clopen-ode should have local truncation error 0{h A ). Let’s check. For 
that method, we have 

1 n 3 

a\ = -, a 2 = 0, a 3 = - 

_ 1 2 
' 2 = 3’ /?3= 3’ 
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so 


Qq + Ol2 + O3 


020 + O 3 P 3 
01202 + a 303 
0300 



3 12 

4 ' 3 ' 3 



1 

2 



1 

3 


Indeed, clopen-ode satisfies all the conditions of an o.d.e. solver with local truncation error (at least) 0(h 4 ). We 
would actually have to show that at least one term containing an h 4 remains in the difference to prove that the 
local truncation error is not of greater degree. 

Before finally answering the question of what happened to Simpson’s-ode, our hard work so far is sufficient 
to check that trapezoidal-ode and open-ode have local truncation error 0{h 3 ) and that Euler’s method has local 
truncation error 0(h 2 ). For trapezoidal-ode, we have oq = 012 = 03 = 0, 0 = 1, and 0 undefined (we may 

assign any particular number we choose since having 03 = 0 makes @3 irrelevant to the method), which gives us 


oi\ + 012 + 03 


0120 + O' 30 
02^2 + 01301 
0-300 


1 

2 

1 


+1+0=! 

1 


2 

0 

00 


1 + 0 = 


2 



1 

6 ' 



1 

3 


The first two conditions are satisfied, but the last two are not. Recall, though, that the first two conditions were 
derived from the h and h . 2 terms while the last two conditions were derived from the h 3 terms. So, for trapezoidal- 
ode, the local truncation error is 0 (h 3 ). 

For Euler’s method, we have aq = 1, 0:2 = <23 = 0, and 0 and 0 undefined (or whatever we choose), which 
gives us 


Oi + 012 + 03 
020 + 030 

O 2 P 2 . + 0303 

0300 


1 + 0 + 0 = 1 
0 + 0 = 0^ i 

0 + 0 = 0^ * 


The second equation, which was derived from terms involving h 2 , is not satisfied but the first equation, which was 
derived from terms involving h, is, so the local truncation error for Euler’s method is 0(h 2 ). 

Finally, for Simpson’s-ode, we have oq = g, 0:2 = §, 03 = 00 = 5, and 0 = 1, which gives us 


Ol + OI2 + Q!3 


020 + 030 

02 P 2 + a 3 Pl 

0300 


1 2 1 1 
6+3+6 ” 1 


2 1 
3 ' 2 
2 (\ 


1 = 


sUJ + 6 (1) “~ 

1 • 1 -1 0 X . 

6 2 ' 6 


1 

3 


The first two equations are satisfied, so the local truncation error is (at least) 0(h 3 ), but the last equation is 
not satisfied, so the local truncation error is no more than 0(h 3 ). No terms containing factors of h or h 2 (that 
don’t also contain higher powers of h) appear in the local truncation error, but the term h 3 a 300 {ftfy + ff 2 ) = 
\h 3 {ftfy + ffy ) does, so it is 0 {h 3 ). 
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A Note About Convention and Practice 

We have derived five o.d.e. solvers so far with little nod to established practice. It’s time to fix that. What we have 
been calling trapezoidal-ode (since it was derived from the trapezoidal rule) is better known as the improved Euler 
method, though some will refer to it as the explicit trapezoidal method. What we have been calling clopen-ode 
is better known as Heun’s third order method. These methods can easily be found in the literature. They are 
prototypical examples of efficient methods. The improved Euler method requires two function evaluations per step 
and gives a local truncation error 0(h 3 ). Heun’s third order method requires three function evaluations per step 
and gives a local truncation error 0(h 4 ). 

What we have been calling open-ode has not been named as it would never be used in practice. It is not an effi- 
cient method, requiring three function evaluations but having a local truncation error of only 0(h 3 ). Consequently, 
you are not likely to see it appear in the literature as it is not a useful method in practice. Heun’s third order 
method or the improved Euler method would both be preferable to open-ode. Heun’s third order method gives a 
smaller truncation error for the same amount of computation (three function evaluations) and the improved Euler’s 
method gives the same truncation error for less computation (two function evaluations). Simpson’s-ode has the 
same shortcomings as open-ode, and thus you are not likely to see it in the literature either. It is also an inefficient 
method. 

Methods of the form (6.4.2) are part of a class of methods called Runge-Kutta methods, named after the German 
mathematicians Carl Runge and Martin Kutta. The basic idea for such methods was laid out by Runge in a paper 
published in 1895, where Runge introduced the improved Euler method and others. His work was continued by Heun, 
whose paper of 1900 brought us Heun’s third order method and others. In 1901, Kutta derives the most famous 
Runge-Kutta method, what is sometimes now referred to as the classic Runge-Kutta method or the Runge-Kutta 
method of order 4, RK4. We will see shortly that it is a modification of Simpson’s-ode. [7] 

Higher Order Methods 

Higher order Runge-Kutta methods can be derived by considering methods of the form (6.4.2) with a number of 
stages, s > 3. Of course higher order methods must satisfy more conditions. In fact, the number of conditions 
grows faster as the desired order increases than does the number of variables as the number of stages increases. In 
other words, there is a point where the number of stages to achieve order p exceeds p. Order 1 methods can be 
derived with one stage (Euler’s method) and no less. Order 2 methods can be derived with two stages (improved 
Euler’s method) and no less. Order 3 methods can be derived with three stages (Heun’s third order method) and no 
less. Order 4 methods can be derived with four stages (example upcoming) and no less. However, order p methods 
with p > 4 require a number of stages s > p, which, in turn means more than p function evaluations. So, the most 
efficient methods are to be found with order 4 or less. 

Simpson’s-ode failed to live up to its potential because it did not have enough stages, not because there is no 
Simpson’s-rule-derived formula with local truncation error 0(h 5 ). The classic Runge-Kutta method of order 4 (local 
truncation error 0(h 5 )) has four stages and is given by 

k\ = 
k 2 = 

k 3 = 

ki = 

Vi + 1 = 

Compare this to Simpson’s-ode: 

ki 
k 2 
k 3 

Vi+i 

They are very similar. If we separate the second stage of Simpson’s-ode into two stages, we get Runge-Kutta’s order 
4 method. That is the difference. Two stages are used to approximate y(ti + |) instead of one! 


r( h h, 

J [ti + 2’^ + 2 fcl 

r( h h, 

J Ui + + 2 fc 2 

f (ti + h,yi + hk 3 ) 


Vi + — \k\ + 2k 2 + 2k 3 + ki) 


= f(u,yi) 

/ h h, 

— f \ —,yi + -k\ 

= f(U+i,Vi + hk 2 ) 


— Vi + g [k\ + 4fc 2 + k 3 ) . 
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Crumpet 35: Derivation of The (Classic) Runge-Kutta Order 4 


To derive any Runge-Kutta method of order 4, the stages of the computation must be expanded in a third Taylor 
polynomial: 

f(U + Pjh, yi + Pjhkj- 1) = / + hpj [ft + kj-tfy] + ^h 2 /3j [f u + Zkj-tfty + kftfyy] 

+ g^ 3 |dj \fttt + 3kj-lftty + 3kj-lftyy + kf^fyyy] + 0(ft 4 ) 

and /(to, Vo ) must be expanded in a fourth Taylor polynomial: 

y(t 0 + h) = y(t 0 ) + hy(t 0 ) + - h 2 y(t 0 ) + ^h 3 y (to) + -j-h 4 V (to) + 0(h 5 ). 

But V , in terms of /, is 


— ^{ftt + Zftyf + fyyf 2 + ftfy + fyf) 

= fyyyf S + 3ftyyf + 4 fyfyyf 2 + 3fttyf + Sftyfyf + fy f 
+3 ftfyyf + f t fy + fttfy + fttt + 3 fifty 


SO 

Vi + 1 = yi + hf+-h 2 (ft+fyf) + -h 3 (ftt + 2ftyf + fyyf 2 + ftfy + fyf) 

+ (fyyyf 3 + 3ft yy f~ + 4 fyfyyf 2 + 3fttyf + 5f ty fyf + fyf 
+3 ftfyyf + ft f y + fttfy + fttt + 3 fifty) + 0(h, 5 ). 


Furthermore, 


and 


ki = f(U,yi) = / 


k 2 = f (ti + p 2 h,yt + p 2 hk 3 ) 

= / + hp 2 [ft + ff y ] + X -h 2 pl [fit + 2 ffty + f 2 fyy\ 

+ g^- 3 /3| [fttt + 3f ftty + 3 f 2 f tyy + f fyyy] + 0(h 4 ). 
Consequently, = f 2 + 2hfi 2 [ft + ffy] f + 0(h 2 ) and k 3 = f + O(h). Therefore 


k 3 


f + [ft + k 2 fy ] + — A. 2 /?3 [ftt + 2k 2 fty + k^fyy) 

+ -h 3 Po [fttt + 3k 2 ftty + Sk^ftyy + k^fyyy] 


f + h/3 3 [ft + (/ + hp 2 [ft + ffy] + lh 2 p 2 [fit + 2 ffty + ffyy] ) fy 

+ \h 2 pl [ftt + 2 (f+ hfc [ft + ffy]) fty + ( / 2 + 2hfo [ft + ffy] f) fyy\ 

+ ^h 3 Po [fttt + 3 f ftty + 3 f 2 ftyy + f 3 fyyy] + 0(h 4 ) 

f + hf> 3 [ft + ffy] + h 2 p 2 p 3 [ft + ffy] fy + fa [ftt + 2 ffty + ffyy \ 

+ \h 3 PzP I [fu + 2 ffty + ffyy] fy + h, 3 f 3 fo [ft + ffy] [fty + ffyy] 

+ g^ 3 d! [fttt + 3 f ftty + 3 f ftyy + ffyyy] + 0(f). 
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So, kl = f + 2hfo [ft + ffy] f + 0(h 2 ) and = f + O(h). Therefore 


k± — f + hf5 4 [ft + k 3 fy] + —h P 4 [ftt + 2k 3 fty + k 3 fyyj 

+ -h 3 /?I [/ttt + 3k 3 ftt y + 3k 3 ft yy + k 3 f yyy ] + 0(h 4 ) 

= / + A04 [ft + (/ + h0 3 [/« + ffy] + h 2 fop 3 [/t + f fy] fy + ft [ftt + 2 f f ty + f f yy \ ) f y 

+ \h 2 pl [ftt + 2 (f + hfa [ft + ff y ]) fty + (f + 2/1/33 [/« + ffy] /) iW] 

+ g^ 3 /^4 [fttt + 3 f ftt y + 3f 2 ftyy + ffyyy] + 0{h 4 ) 

= / + A04 [ft + ffy] + h 2 /3 3/34 [/* + ffy] fy + pi [ftt + 2 f fty + ffyy] 

+ h 3 P 20304 [ft + ffy] fy + \h 3 p A Pl [ftt + 2f fty + ffyy] fy 

+ h 3 P f03 [ft + ffy] [fty + ffyy] + g h 3 ft\ [fttt + 3 f ftty + 3 f ftyy + ffyyy] + 0(h 4 ). 
Matching coefficients in 

Vi + 1 = yi+hf+-h 2 (ft + f y f)+-h 3 (ftt + 2ftyf+fyyf~ + ftf y + fyf) 

+ ( fyyyf 3 + 3 ftyyf + 4 fyfyyf + 3 ftty f + 5 ftyfyf + fyf 


with coefficients in 

up to order 4 yields the conditions 


+3 ft fyy f + ftfy + fttfy + fttt + 3 ft fty) + 0(f) . 
yi+i =yt + h [ot\k\ + a 2 fe + 03^3 + 04 ^ 4 ] 


Ol + 02 + 03 + O 4 = 

1 

(6.4.5) 

0202 + Ot 303 + 0404 = 

1 

2 

(6.4.6) 

0202 + 0 303 + 0404 = 

1 

3 

(6.4.7) 

030203 + 040304 = 

1 

6 

(6.4.8) 

0202 + 0303 + 0404 = 

1 

4 

(6.4.9) 

030302 + Q 40403 = 

1 

8 

(6.4.10) 

2030302 + 2040403 + 030302 + 040403 = 

1 

3 

(6.4.11) 

030302 + 0404 03 + O 3 0302 + 040403 = 

5 

24 

(6.4.12) 

030302 + 040403 = 

1 

12 

(6.4.13) 

04020304 = 

1 

24' 

(6.4.14) 


Any four-stage (s = 4) fourth order Runge-Kutta method of the form (6.4.2) will have to satisfy these 10 equations 
with only 7 degrees of freedom (7 variables). Either the equations form a dependent set or solutions will be rare. 
In an attempt to solve the system, we solve (6.4.14) for 04 : 

1 

04 “ 24/3 2 /3 3 /3 4 ' 

Substituting our formula for 04 into (6.4.8) and solving for 03 : 

_ 4/3 2 - 1 

“ 3 24$& ' 

Substituting our formulas for 03 and 0:4 into (6.4.13) and solving for 03: 

03 = — 40 | + 302 - 
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Substituting our formulas for 03 , a 4 and 03 into (6.4.10) and solving for /? 4 i 

ft = (6 -16ft + 16$)ft. 


Substituting our formulas for 03 , a 4 , ft and ft into (6.4.6) and solving for 0 . 2 '- 

_ 2 - 1602 + 52ft - 48ft 
a2_ 24/3| (3 - 4,0 2 ) ' 


Substituting our formulas for 02 , 0 : 3 , a 4 , ft and ft into (6.4.7) and simplifying: 

16 ft - 12ft + 4ft - 1 = 0. 


The roots of this last equation are @2 =s | , 1± “ v ^ , so we conclude that @2 


|. Back substituting, we find 


ft 

012 

ft 

ft 


Q!3 


a 4 


1 

2 

1 

3 

1 

1 

2 

1 

3 

1 

6 ' 


Substituting these values of 02 , 03 , and a 4 into (6.4.5), we find 

1 

ai = -. 

6 

These seven values are the unique simultaneous real solution of the equations (6.4.14), (6.4.8), (6.4.13), (6.4.10), 
(6.4.6), (6.4.7), and (6.4.5). So the seven parameters are determined by 7 of the ten conditions. It remains to 
show that these seven values also satisfy (6.4.9), (6.4.11), and (6.4.12), which they do. Finally, note that these 
are the values of the parameters for the (classic) R.unge-Kutta method of order 4. 


Key Concepts 

Taylor’s theorem in two variables: Suppose f(t,y) and all its partial derivatives of order n+ 1 and lower are 
continuous on the rectangle D = {( t , y) : a < t < b, c < y < d}, and let (to, yo) £ D. Then for every ( t , y) £ D , 
there exist £ £ (a, b) and y £ (c, d) such that 


f(t,y ) = f(to,yo) + [(t-to)- Mt 0 ,yo) + (y-yo)- f y (to,yo)] 


+ 2 [(* - to) 2 fu(t 0 , yo) + 2 (t - 1 0 ) (2/ - yo) ■ yo) + (y- yo) 2 f vv (to, 2/0)] 


H b 

1 


d 


E( ” ) (*-* 0 )" j (y-yoY QtH-fgyi fayo) 


1 


(n + 1 )! 


n+1 

E 

1=0 
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Exercises 

1. Determine analytically the local truncation error for 
the o.d.e. solver derived in exercise 1 on page 212. 
Compare it to the local truncation error of the under- 
lying integration formula. Are they the same? Also 
compare it to the experimentally determined rate of 
convergence (see exercise 2 on page 213). Is it one de- 
gree higher, as should be expected? ^ 

2. Execute one step of Runge-Kutta order four for solv- 
ing y = ty with y( 1) = 0.5 and h = 1, thus approx- 
imating y( 2). Compare your answer to that of sec- 
tion 6.2 exercise lc on page 205 in which you used 
Euler’s method with two steps. The exact solution is 
y( 2) = ^ « 2.240844535169032. [s] 


3. Explain geometrically, and in your own words, im- 
proved Euler’s method. 

4. * ■ Write an Octave function that implements improved 
Euler’s method (same as exercise 4 on page 213 except 
this time the method has a proper name). ^ 

5. * o Write an Octave function that implements Heun’s 
third order method (same as exercise 5 on page 213 
except this time the method has a proper name). ^ 

6. " o Write an Octave function that implements RK4. ^ 

7. " • Use your code from exercise 6 to compute y( 2) for 
the o.d.e. in exercise 1 on page 205 using step size 
h = 0.05. [S,[A) 
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6.5 Adaptive Runge-Kutta Methods 

Two of the o.d.e. solvers derived in section 6.3 used the exact same set of calculations for k\, and k 3 , but 
combined the results differently to compute yi+i- At the time, these were called open-ode and clopen-ode. In the 
analysis of section 6.4 it was noted that open-ode was not an efficient method while clopen-ode was, at which point 
we began referring to clopen-ode by its proper name, Heun’s third order method. 


Crumpet 36: Heun’s third order method 


In this article from 1900 [ 6] Karl Heun puts forth the third order method that bears his name. Even if you can 
not read the German, his formula VI) is clear! 


30 


Neue Methode zur approximative!! Integration etc. 


Da das Glied f&Df in dem Faktor von Aar* nicht vorkommen 
kanu. so ist auf diesem Wege keine vollstiindige Niikerung bis zum 
Gliede vierter Ordnung moglich. Nelimen vir also zunachst auf das 
vierte Glied keine Rucksicht, so erhalten vir die folgenden vier Be- 
dingungsgleichungen filr die a, s und c': 

7) 2 asc '~r 

Fttr n = 2 hat man sechs Unbekannte, so class man zu diesen 
Gleichungen noch Bedingungen hinzufugen kann. Setzt man z. B. 
0, so sind die flbrigen Koeffizienten aus den Gleichungen 

l „ 1 , 1 

Cfl + = 1; ^ 8 a 2**'2 , = *2***8 g 

zu bestimmen. Es ergiebt sick 

2 l 3,1 

*3 — s 1 *1 " 4 9 tt 3 “ 4 f £ 3 = 3 

und hieraus resultiert die fttr die Anwendungen sehr bequeme Formel 


VI) 


Ai/ = !{/’( x >2') + 3 f( x + y + Ay )) 

A '» “!/■(*■+ ? A *» !/ + }/‘- a *)- a *- 


AIs Beispiel moge die Integration der Gleichung 
dy 1 

dx “ I + yo 26-(a:-j()* 


dienen. Die Resultate sind in der nachfolgenden Tabelle zusammen- 
gestellt. 


X 

r 

■r+idx 


J' y 

y+^'y 

2 

x-\--dx 

*y 

y 

Corr. 

0.0 

0.6667 

0.1 

0.0667 

0.1334 

0.1334 

0.2 

0.2004 

0.0000 


0.3 

0.6711 

0.4 

0.2675 

0.1350 

0.3354 

0.5 

0.2032 

0.2004 

0.0001 

0.6 

0.6884 

0.7 

0.4721 

0.1384 

0.5420 

08 

0.2089 

0.4036 

0.0001 

0.0 

0.7092 

1.0 

0.6834 

0.1440 

0.7565 

1.1 

0.2182 

0.6125 

0.0001 

1.2 








0.8307 

0.0001 


Die Rechnung ist init y = 0 fttr ®-=0 begonnen und mit vier- 
stelligen Logarithmen durchgefiilirt. Der Febler nach vier Fort- 
setzungen ist verschwindend klein. 

Die Annabme «, = a, fUhrt zu einer anderen zweigliedrigen 
Formel, welche den Vorzug der vollstiiudigen Symmetric besitzt. 

Aus den Gleichungen 

*«+*»-!» e 2 !+ £*2= I 

folgt dann fiir s', = 


Due to its inefficiency, open-ode should never be used in practice by itself, but combined with Heun’s third order 
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method, it has some potential usefulness. 

According to Heun’s third order method 

ki 

k 2 

k3 

Vi + 1 

Using the same fci, k 2 , and fc 3 , the open-ode method is calculated as 

Ui+i = Vi + 2 [k 2 + k 3 ] + 0{h 3 ). 

The difference between these estimates is 

J [jfei - 2 k 2 + k 3 ] = Mh 3 + 0(h A ) (6.5.1) 

for some constant M, and represents the local truncation error of the lower order method, open-ode. This error 
estimate can be used to adapt the size of h from one step to the next, decreasing the step size when the local 
truncation error is bigger than some tolerance and increasing the step size when the local truncation error is smaller 
than some tolerance. 

To illustrate the algrorithm and the benefits of adaptive routines, let’s return to o.d.e. 6.2.1, y = —j+t 2 , which 
we have generously leaned upon already. As before we will estimate y{ 2) given initial condition y{ 4) = 20. This 
time the number of steps to compute will be determined by the algorithm, not by us, at least after the first step. 
Unfortunately, there is no standard or fool-proof way to choose the size of the first step. Because we are looking 
for a computation that can be done by hand, let’s try h = — 1 to begin, 1 of the width of the interval [2,4], over 
which we will integrate. 

As was needed for adaptive quadrature, a desired level of accuracy, or tolerance, is needed here too. Again 
because we are looking for a computation that can be done by hand, let’s try 0.1, a pretty modest accuracy. 
Finally, we are ready to compute: 

ki = /( 4, 20) = 11 

k 2 = f ^4 - * , 20 - ^ • 11^ « 8.98989898989899 
k 3 = f ^4- ?,20- ^ • 8.9898...^ « 6.90909090909091. 


= f(U,yi ) 

/ h h, 

— J [ti + + 7^k i 


, . 2 h 2 h , 

J ( U + — , yi + ~r^k 2 


Hi + [ki + 3^] + 0(/i 4 ). 


Before computing yi from these values, we need to check that the expected accuracy of the calculation would not 
violate the 0.1 requirement: 



2 k 2 + k 3 ] 


0.017. 


The approximate error in stepping to ii = 3 is about 0.02, well below the desired threshhold. We are clear to 
proceed: 


yi = Vo+j [ki + 3 fc 3 ] « 12.06818181818182 
t\ — to 4” k -- 3. 


Hence we have y{ 3) 


12.07. Continuing with h = 1, 


k x = /( 3, 12.068 . . .) « 4.977272727272728 
k 2 = / ^3- 12.068... - i -4.9773... 

k 3 = f U- ^,12.068... -?• 3.2077... 


3.20770202020202 

1.188852813852814. 
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Before computing y 2 from these values, we need to check that the expected accuracy of the calculation would not 
violate the 0.1 requirement: 


h 

4 


[fci 


2fe + k 3 ] 


0.062. 


The approximate error in stepping to t 2 = 2 is about 0.06, well below the desired threshhold. We are clear to 
proceed: 


y 2 = 2/1 + \ [h + 3fc 3 ] « 9.932224025974026 
t\ = to T h — 2. 

Hence we have y(2) « 9.932. After two steps, the actual error is about 1 10 — 9.932| = 0.068. Of course, we could 
have simply executed Heun’s third order method with step size h = 1 (and no error checking) and gotten the same 
answer. The difference is we would not have had any idea what to expect for an error! With the adaptive method, 
you can be reasonably sure each step incurs only the error you request. At the risk of belaboring the point, consider 
redoing the calculation with step size h = —2: 

ki = /( 4 , 20 ) = 11 

k 2 = f (4 - 20 - | • llj w 7.311111111111111 

k 3 = f ^4- ^,20- ^ • 7.3111.. « 3.266666666666667. 

If we proceed with Heun’s third order method (and no error checking), we get 

Vi = Vo + ^ [k\ + 3 / c 3 ] « 9.6 
t\ = tg T k — 2. 


However, without the exact answer, which will be the usual when using a numerical method, we have no way to 
know how accurate this estimate is! In that regard, the value 9.6 is a somewhat useless estimate. 

On the other hand, since we know the exact value of 2/(2) is 10, we know the error is 0.4, larger than the desired 
0.1. The adaptive Heun should catch this and arrive at a more accurate estimate: 


h 

4 


[ki 


2 k 2 + A3] 


0.177. 


The adaptive method would reject this step because the approximate error is greater than the desired accuracy, 
without calculating y\\ So what should it do instead? The adaptive method will try again with a smaller step size. 
Since 


h 

4 


[ki 


2 k 2 + A3] 


Mh 3 , 


we have Mh 3 « 0.177 for any step size close to the one just attempted. If we scale the step size by a factor of q , say, 
we should expect the new error to be approximately M(qh) 3 , or q 3 Mh 3 « 0.177 q 3 . Since we would like that error 

to be no more than 0.1, we should choose q so that 0.177 q 3 < 0.1 or q 3 < zpfyy, which implies q < ~ 0.8254. 

But it would slow down the algorithm immensely if the step size were too large very often, so instead, we will take 
a somewhat conservative next step of 0.9 qh « 0.9(0.8254) ( — 2) « —1.485. Recalculating with the new step size: 


Ai 

k 2 

k 3 


/( 4 , 20 ) = 11 


20- 


1.485 

3 



8.130924301356263 


/ 



4 4 

20 — - -7.3111... 
3 3 


5.087191526760124. 


and 


— [Ai — 2 k 2 + A3] 


0.06487930780869297. 
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so this step is accepted: 


2/2 = Vi + \ [k + 3/c 3 ] « 10.24469652063055 
U = t 0 + h = 2.514132737997418. 


Now we keep the new step size until it proves to be inappropriate. In this case, that happens right away. Another 
step of —1.485 would take the solution to t 2 ~ 1.028, well past the desired t = 2. So, we shorten the step size to 
2 — ti = —0.514132737997418. There is no worry about shortening the step size as that is expected to reduce the 
error! Finally, with h = —0.514132737997418: 


k 

k-2 

ki 


/(2.514 . . . , 10.244 . . .) « 2.246020292164824 
/ ( 2 .514 ... - °- 51 f- , 10.244 ... - 2 °- 51 f- . 2.246 . . . 


/ I 2 . 5 1 - 1 . . — °' 51 '* 1 ' ' ’ ; 1 Q . 2 44 . . . 2 0 - 5 l ^- .,. 2 79 - 


1.279876276642283 

0.1988478127940674. 


and 


this step is accepted: 



2fc 2 + k 3 ] 


0.01476646399275057, 


j/2 = 2/i + j [k + 3 fc 3 ] « 9.879332752200975 
t\ == to -(- h = 2. 


We have y( 2) ss 9.879332752200975 with some confidence that the error will not be terribly much more than about 
0.2, since we took two steps each of which may have incurred an error of about 0.1. There is no guarantee the error 
will be less than 0.2, but at least we have some confidence that it’s not drastically greater. And because we used 
a conservative estimate for step size, the actual error is probably a bit smaller (as it turns out, the error is about 
0.12). 


Adaptive Runge-Kutta (pseudo-code) 

There are many different adative Runge-Kutta schemes, but the one discussed here uses second and third order 
methods, so might be called RK2(3). Technically, it is an order 2 method since the error estimate is for the lower 
order method. In practice, however, it is often the higher order method that is used for the o.d.e. solution. While 
there is never any guarantee the higher order method is more accurate than the lower order method, it rarely causes 
any adverse problems. Besides hedging our bets with the 0.9 safety factor when adjusting the step size, we also 
disallow any scaling of h by any factor less than 0.1 or any factor greater than 5. These extra safeties are not 
terribly restrictive since they allow for exponential growth or decay of h , but they can help avoid problems when 
the error estimates are simply bad. Moreover, the estimates are only good for a small range since the constant of 
proportionality may change dramatically for large changes in A A more detailed discussion of the algorithm can 
be found in [26] Section 16.2. 

Assumptions: y = f(t,y), y(a) = yo has a unique solution over the interval from a to b. 

Input: Initial value (a, z/o) ; function f(t, y); interval endpoints, a and b; initial step size h; desired accuracy 
tol ; maximum number of iterations N. 

Step 1: Set 1 = 1; t = a; y = yo', done = false', 

Step 2: While not done and i < N do Steps 3-6: 

Step 3: If ((& — (t + h )) • (b — a) < 0) then set h = b — t; done = true; 

Step 4: Set k x = f{t,y); k 2 = f(t+ §,?/+ f^i); k 3 = f(t + ™,y+ ^ k 2 ); err = \\{k ~ 2fc 2 + fc 3 )|; 
Step 5: If done or err < tol then set y = y + j {k\ + 3 fc 3 ); temp = t + h; 

Step 6: If temp = t then do Steps 7-8: 

Step 7: Print “Method failed. Step size reached zero.” 

Step 8: Return 
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Step 9: Set i = i + 1; 

Step 10: If err < 


or err > tol then do steps 11-14: 


Step 11: Set q = 



Step 12: If q < ^ then set q = ^ 

Step 13: If q > 5 then set q = 5 
Step 14: Set h = qh 

Step 15: If not done then Print “Method failed. Maximum iterations exceeded.” 

Output: Approximation y{b) or message of failure. 

The formulas for ki and err will need to be changed for different adaptive Runge-Kutta schemes, as will the 
recalculation of h in Steps 11-14, but the basic algorithm does not require modification for other embedded methods. 

General Runge-Kutta Schemes 

Up to now, we have considered Runge-Kutta methods of the form (6.4.2), copied here for convenience: 


ki = f(ti, yi) 

k 2 = f (U + fi 2 h, yi + /3 2 hki) 

k 3 = f{U + P 3 h, yi + (3 3 hk 2 ) 

k s = f(U + /3 s h , y t + /3 s hk s -i) 

Vi+ i = Vi + h [aiki + a 2 k 2 + a 3 k 3 H b a s fc s ] . 


In methods of this type, k± is used in the computation of k 2 ; k 2 is used in the computation of k 3 ; k 3 is used in the 
computation of £q; and so on. However, there is nothing preventing one from deriving a method where both k\ 
and k 2 are used in the computation of k 3 ; all of k\ , k 2 , and k 3 are used in the computation of k 3 \ and in general 
allowing all of k\, k 2 , . . . , kj-i to be used in computing kj. Doing so gives more degrees of freedom for satisfying 
the error analysis equations, lending hope that there are many more Runge-Kutta methods possible. Any method 
of this more general form is called an explicit Runge-Kutta method and can be formulated as 


k\ = f(ti,yi) 

k 2 = f {U + 5 2 h,yi + /3 21 hki) 

k 3 = f(U + S 3 h , yi + /3 31 hk i + p 32 hk 2 ) 


S — 1 


k s = f(U + S s h, yi + ^2 Psjhkj) 
i=i 

Vi+i = yi + h[a!ki + a 2 k 2 + a 3 k 3 -\ b a s k s ] . 


(6.5.2) 


Methods of this form are often summarized in a Butcher tableau, 


0 

^2 kl 

^3 @31 /?32 


<5 S 13 Sl (3 s2 ‘ ‘ ' Ps(s-l) 


ol\ a 2 


Q s _l 


much like the coefficients of a system of linear equations might be summarized in a matrix. The Butcher tableau 
for any of the Runge-Kutta methods we have considered so far will take the form 
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0 




<52 

$21 



<53 

0 

$32 


<54 

0 

0 

$43 

<5, 

0 

0 

7 

CO 

CO 

o 


Oil 

a.2 

03 • • • a s - 1 a s 


For example, Heim’s third order method would be summarized in a Butcher tableau as 


0 


1 

1 

3 

3 

2 

3 

0 2 

u 3 


I 0 2 

4 u 4 


For our purposes, adaptive Runge-Kutta schemes, also called embedded methods, will be coded in a Butcher tableau 
by adding one more line for the coefficients ay of the lower order method. For example the Butcher tableau for 
RK2(3) as presented above would be 


0 


1 

1 

3 

3 

2 

3 

0 2 

u 3 


1 o 3 

4 u 4 


o 1 I 
u 2 2 


The most general Butcher tableaux for non-embedded methods take the form 


0 

$n 

$12 

$ls 

<5 2 

$21 

$22 

$2 s 

<5 S 

$sl 

$s2 

' Pss 


Oil 

0.2 

a s 


If any of the pij with j > i are nonzero, the associated Runge-Kutta scheme is an implicit method. Each step 
of the method will require solving a system of equations. Implicit Runge-Kutta methods can be considered for 
approximating the solutions of stiff o.d.e. since explicit methods are often exceedingly bad at it. 


Crumpet 37: A Stiff Ordinary Differential Equation 


The ordinary differential equation 

2 3 

x = x — x 

®(0) = 6 (6.5.3) 

has no closed form solution. The best one can do is derive an implicit solution, so a numerical solution is necessary 
to approximate values of the function. Some basic analysis can give an idea what the solution is like, however. It 
has an equilibrium at x = 0, which means if ir(to) = 0 for some to, then x(t) = 0 for all t. The function remains 
constant for all time. It is in equilibrium. It does not change. This follows from the fact that when x = 0, 
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x = 0 2 — 0 3 = 0. Similarly, the o.d.e. has an equilibrium at x = 1 (because 1 is another root of the polynomial 
x 2 — x 3 ), and it has no others. However, the two equilibria are very different from one another. The equilibrium 
at x = 0 is unstable while the equilibrium at x = 1 is stable. If x(to) is near enough to 1 (|a;(to) — 1| < 1 will do), 
then x will tend toward 1 as t — > oo. However, there is no such condition near x = 0. No matter how close x(to) 
is to zero, if it is positive, x will still tend to the other equilibrium, 1, as t — > oo. More to the point, though, is 
how the values of x approach 1 as t — > oo. 

The hope for an adaptive o.d.e. solver is that it will take large steps where the function is not varying quickly 
(has a small first derivative) and will be more careful by taking small steps where the function is varying quickly 
(has a large first derivative). More often than not, this is exactly what happens. Stiff o.d.e.s are an exception to 
the rule where an adaptive method takes many small steps even in a region where the function has a small first 
derivative. The following figures show the solution of (6.5.3) using RK2(3) with tolerance 10~ 6 , S = 10 - ' 5 , and 
initial step size 3 over the interval [0, |). First, the solution over [0, 980] acts as we would hope. The solver takes 
large steps, including one step from t ~ 93 to t « 210, a step size h > 117 at the beginning where the function 
changes very slowly. 


X 


0.045 

0.04 

0.035 

0.03 

0.025 

0.02 

0.015 

0.01 

0.005 

0 



0 100 200 300 400 500 600 700 800 900 


t 


In the middle, the solution over [980, 1020] continues to act as we would hope. The solution begins to vary more 
quickly here and, consequently, the solver takes a number of smaller steps. 



Toward the end, the solution over [1020, 2000] demonstrates the consequence of stiffness. The exact solution is 
very nearly constant over this region, gradually approaching 1 from below. A good solver would again take large 
steps across this region, but adaptive explicit Runge-Kutta schemes do not. The numerical solution oscillates 
within tolerance about 1, so it does what it is supposed to do, but it takes many short steps to do so. 


X 
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0.999999 

0.999999 

0.999998 

0.999998 
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Key Concepts 

Embedded Runge-Kutta method: A Runge-Kutta method in which there are two schemes of different orders 
derived from the same set of function evaluations. 

Adaptive Runge-Kutta method: A Runge-Kutta method that takes advantage of an embedded Runge-Kutta 
scheme to automatically adapt the step size as it estimates the solution of an o.d.e. 

Butcher tableau: A tabular representation of a Runge-Kutta method. 

RKm(n): Shorthand for an embedded Runge-Kutta method containing schemes with rates of convergence (com- 
monly called orders) m and n. 


Exercises 

1. 0 • Write an Octave function that implements RK2(3) 
as presented in pseudo-code. ^ 

2. Which are the Butcher tableaux of implicit methods? 

[A] 


(a) 


(b) 


0 







1 

1 


1 




4 

8 


8 




1 

0 


1 




2 


2 




3 

3 


0 

9 



4 

16 


16 



l 

3 


2 

12 

8 


7 



7 

7 



7 


32 

12 

32 

7 


90 


90 

90 

90 

90 

0 







1 

1 






4 

4 






3 

9 


3 




4 

4 






1 

1 


5 

1 



2 

18 


12 

36 



l 

7 

9 


5 

3 

1 

9 

2 



1 

6 


0 

0 

2 

3 

1 

6 

0 







1 

1 






2 

2 






1 

0 

1 





2 

2 





l 

0 

0 

l 





1 

1 

1 

1 




6 

3 

3 

6 




(d) 


o 

1 

\/5 

y/5 

1 


12 

12 

12 

12 

5-vTi 

1 

1 

10— 7\/5 

V5 

10 

12 

4 

60 

60 

5+VE 

1 

lO+Ty/5 

1 

\/5 

10 

12 

60 

4 

60 

l 

1 

5 

5 

1 


12 

12 

12 

12 


1 

5 

5 

1 


12 

12 

12 

12 


3. Show that this is the Butcher tableau for Euler’s 
method. 


0 

0 


i 


4. Show that this is the Butcher tableau for the improved 
Euler method. ^ 


0 



1 

1 



1 

1 


2 

2 


5. Show that the method given by the Butcher tableau 
has order 2 for any S G [|, 1]. 


0 



5 

5 



1 1 

1 


1 28 

28 


6. * • Demonstrate numerically that the method sug- 
gested by the Butcher tableau has rate of convergence 
0{h 3 ). 


(a) 


(b) 


(c) 



7. Euler’s method and the improved Euler method use the 
same function evaluations. Thus, they can be combined 
into an embedded, and therefore adaptive, method. 
Write the Butcher tableau for the Euler/improved Eu- 
ler embedded method. 

8. * • Write an Octave function that implements the 
adaptive method suggested in exercise 7. 

9. 8 o 4-rule Runge-Kutta method. Demonstrate nu- 
merically that the |-rule method, given by the Butcher 
tableau, has rate of convergence 0(h 4 ). 
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10. • Write an Octave function that implements the 

RK3(4) adaptive method ([6] page 301) given by the 
Butcher tableau. ^ 


0 


1 

1 

4 

4 

3 

9 3 

4 

4 ° 

1 

15 1 

2 

18 12 36 

l 

7 _5 -1 2 


9 3 9 ^ 


i 0 0 § l 

6 3 6 


1 _5 _1 2 0 

9 3 9 Z U 


11. 0 o Cash-Karp RK4(5). Write an Octave function 
that implements the Cash-Karp adaptive method given 
by the Butcher tableau. ^ 


0 







1 

1 






5 

5 






3 

3 

9 





10 

40 

40 





3 

3 

9 

6 




5 

10 

10 

5 




l 

11 

5 

70 

35 



54 

2 

27 

27 



7 

1631 

175 

575 

44275 

253 


8 

55296 

512 

13824 

110592 

4096 



37 

0 

250 

125 

0 

512 


378 

621 

594 

1771 


2825 

0 

18575 

13525 

277 

1 


27648 

48384 

55296 

14336 

4 


12. The following pairs of Runge-Kutta methods use the 
same function evaluations, but have different rates of 
convergence. They can each therefore be paired to form 
an embedded Runge-Kutta scheme. Write the Butcher 
tableau for the embedded method. 

(a) The method of exercise 6a and open-ode. 

(b) The |-rule (exercise 9) and the following. b-1 


0 


1 

1 

3 

3 

2 

- 1 i 

3 

3 


oil 
u 2 2 


(c) The |-rule (exercise 9) and the following. 


0 


1 

1 

3 

3 

2 

- 1 i 

3 

3 

l 

l -l l 


- — - 0 1 

2 2 u 


(a) The method of exercise 6b and the following. 


0 






2 

2 





7 

7 





4 

8 

4 




7 

35 

5 




6 

29 

2 

5 



7 

42 

3 

6 



l 

1 

1 

5 

1 


6 

6 

12 

4 



11 

7 

35 

7 

1 


96 

24 

96 

48 

12 


(b) Bogacki— Shampine rk2(3). The method of ex- 
ercise 6c and the following. ^ 



13. " d Butcher [6] credits Merson (1957) with the earliest 
proposed embedded Runge-Kutta method, given by the 
Butcher tableau. What are the orders of the two meth- 
ods? 


0 






1 

1 





3 

3 





1 

1 

1 




3 

6 

6 




1 

1 

0 

3 



2 

8 

8 



l 

1 

2 

0 

3 

2 

2 



1 

6 

0 

0 

2 

3 

1 

6 


1 

n 

3 

2 

1 


10 


10 

5 

5 


14. " » Merson (1957). Write an Octave function that 
implements the adaptive method of exercise 13. ^ 

15. * » The initial value problem 

, a; + 26^ cos(e :E ) 

y l + ey 

3/(0) = 2 (6.5.4) 


can not be solved analytically. The solution must be 
approximated. Use your code from the given exercise to 
approximate j/(4) with an error of no more than 1CU 4 . 


(a) 

l [S] 

(b) 

8 

(c) 

10 

(d) 

11 w 

(e) 

12a 

(f) 

12b [A] 

(g) 

12c 

(h) 

12a 

(i) 

12b 

(j) 

13 

00 

14 
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16. 


c 


The initial value problem 


/ 

V 

2 /( 0 ) 


x 2 + y 
x — y 2 
5 


(6.5.5) 


can not be solved analytically. The solution must be 
approximated. Use your code from the given exercise to 
approximate y( 3) with an error of no more than 10 -4 . 


(a) 

l [S] 

(b) 

8 

(c) 

10 

(d) 

11 W 

(e) 

12a 

(f) 

12b [A] 

(g) 

12c 

GO 

12a 

(i) 

12b 

(j) 

13 

00 

14 


17. • Consider the initial value 


a 2 xy 

2 /( 1 ) = 1 - 

(a) Use your code from exercise 5 on page 226 (Heun’s 
third order method) to estimate y( 2) with step 
size 0.01. 

(b) Use your code from exercise 6 on page 226 (RK4) 
to estimate 7 /( 2 ) with step size 0.01. 

(c) Compare the results of parts (a) and (b). You 
should notice that they are rather different. The 
rest of this exercise explores the reason for the 
discrepancy. 

(d) Use your code from exercise 1 (rk2(3)) to estimate 
y(2) with tolerance 0.001 and maximum number 
of steps 1000. 

(e) Use your code from any of the parts of exercise 12 
to estimate 7 /( 2 ) with tolerance 0.001 and maxi- 
mum number of steps 1000. 

(f) You should have found that the method fails in 
both parts (d) and (e). However, if you look at the 
last calculated values of x and y anyway (x(1001) 
and y(lOOl)), you should find that in both cases, 
x ss 1.648 and y sa 0. The failure to approxi- 
mate 7 /( 2 ) is not a shortcoming of the numerical 
method. The solution of the initial value problem 
only exists over the interval [l,y/e) ~ [1,1.648). 
For dependable results, care must be taken that 
the solution of the o.d.e. exists and is unique over 
the entire interval from a to b. That said, the ba- 
sic (non-adaptive) solvers plow right along and 
give an approximation for 7 /( 2 ) that is entirely in- 
correct. Without some further analysis, you may 
not notice that the basic solvers are producing 
bogus information. On the other hand, the adap- 
tive solvers give some clue as to what is going on 


problem 
1 + 2 1 2 


due to their failure to proceed beyond x = y/e. 
They get “stuck” taking tinier and tinier steps 
near x = y/e, as they should since the solution 
does not exist beyond that point. 

18. *- • Attempt to approximate 7/(4) for the initial value 
problem in exercise 16. Use a variety of adaptive and 
non-adaptive methods with a variety of tolerances. You 
should find that you can not obtain dependable results. 
Can you explain why not? HINT: You may wish to plot 
the approximate solutions. If your solvers are written 
so as to store the points in arrays, it is a simple mat- 
ter to plot the solutions, as demonstrated for RK2(3), 
using the code from the solution of exercise 1. 

[y,x]=rk23(f ,0,5,4, .0001,1000) ; 
plot(x,y) 

19. * ■ The initial value problem 

V = ln(* + y) 

2/(°) = i 

can not be solved analytically. The solution must be 
approximated. Apply the indicated method to com- 
pute 7/(5) using tolerance 10~ 4 and an initial step 
size A. Is the global error (the error in approximat- 
ing 7/(5)) around 10 -4 ? significantly smaller? sig- 
nificantly larger? Accurate to 10 significant digits, 
7/(5) = 6.409445034. [A] 

(a) Cash-Karp (exercise 11) 

(b) Bogacki-Shampine (exercise 12b) 

(c) Merson (exercise 14) 

(d) RK2(3) (exercise 1) 

20. ° o Modify the code you used in exercise 19 to count 
the number of function evaluations performed. Which 
method was most efficient? The method with the 
fewest evaluations was the most efficient. A 

21. ■ There are many embedded methods not mentioned 
in this text, mostly of high order. Look some of 
them up, write code to implement them, and test your 
code. In particular, you may look for the methods of 
Fehlberg, Verner, or Dormand & Prince. 

22. The Cash-Karp RK4(5) method [8] was designed to 
contain embedded methods of all orders from 1 through 
5, not just orders 4 and 5. Show that the three em- 
bedded methods given in the Butcher tableau have the 
indicated orders. 


0 






1 

1 





5 

5 





3 

3 

9 




10 

40 

40 




3 

3 

9 

6 



5 

10 

10 

5 




19 

54 

0 

10 

27 

55 

54 

Order 3 


3 

2 

5 

2 

0 

0 

Order 2 


l 

0 

0 

0 

Order 1 
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Solutions to Selected Exercises 


Section 1.1 

3a: \p - p\ = | _ 123 1 = | « 0.111 

3c: \p-p\ = |1000 - 2 10 | = |1000 - 1024| = 24 

3e: \p — p\ = 1 10 — 4 — 7 T — 7 1 = 1 0.0001 — d?| ~ 2.3109(10) -4 , using the Octave command 

abs (10~-4-pi~-7) . 


4a: 


4c: 


4e: 


\p-p\ 

\p\ 

\p-p\ 

\p\ 

\P-p\ 

\p\ 


I 1106 


- 123 


123 

1 1000 — 2 10 | 


2 10 


1107 


3 

128 


a 9.03(10) -4 
0.0234 



10000 


0.69797, using the Octave command 


abs(10~-4-pi~-7)/pi~-7. 


5a: log 
5c: log 
5e: log 


P 

P-P 

P 

P-P 

P 

P~P 


log 

log 

log 


123 

^ - 123 


2 10 

1000 - 2 10 


10~ 4 — 7 r -7 


« 3.0 

« 1.6 

0.15616, using the Octave command 


log(pi~-7/abs (10~-4-pi~-7) ) /log (10) . 


10a: /( 2) = e sln i 2 d In Octave: exp(sin( 2 )), which gives 2.4826. 

10c: /( 2) = tan _1 (2 — 0.429). In Octave: atan(2-0 . 429) , which gives 1.0039. 

12 a: We need to find p such that \p — n\ = 0.001, so p — ir = ±0.001, so p = tt ± 0.001. There are two possible 
solutions, 7 r — 0.001 « 3.14059 and n + 0.001 « 3.14259. 

12 c: We need to find p such that | p — ln(3)| = 0.001, so p — ln(3) = ±0.001, so p = ln(3) ± 0.001. There are two 
possible solutions, ln(3) — 0.001 ~ 1.09761 and ln(3) ± 0.001 ~ 1.09961. 


12 e: We need to find p such that 


P~ 


10 


ln(l.l) 


= 0.001, so p — = ±0.001, so p = pTpfxy ± 0.001. There are two 


possible solutions, — 0.001 ~ 104.91958 and In(1 ± 0.001 ss 104.92158 
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Solutions to Selected Exercises 


13a: We need to find p such that ^ ^ = 0.001, so p — 7r = ±0.00l7r, so p = 7r(l ± 0.001). There are two possible 
solutions, tt( 0.999) « 3.13845 and tt(I.OOI) « 3.14473. 

13c: We need to find p such that = 0.001, so p — ln(3) = ±0.001 ln(3), so p = ln(3)(l ± 0.001). There are 

two possible solutions, ln(3)(0.999) « 1.09751 and ln(3)(l ± 0.001) ~ 1.09971. 

Ip.... 10 I: 

13e: We need to find p such that 1 ‘"p 1 ,1 J = 0.001, so p — = ±0.001 so p = ± 0.001). There 

are two possible solutions, i^!) (0.999) ~ 104.81566 and (1.001) ~ 105.02550. 


Section 1.2 

la: From Taylor’s theorem, T 3 (x ) = J2k=o ^ ’k^ ( x ~ x o) k = /(^o) ± f(io) ' (i - ^o) ± - ^o) 2 + ^ ' 

( x — xo) 3 for any function / with enough derivatives. So to find T 3 (x), we need to evaluate /, /', /", /'" at 
Xq = 0. To that end, f(x) = sin(x), so f'(x ) = cos(x), f"(x) = — sin(x), and f"(x) = — cos(x). Therefore, 
/(x o) = sin(0) = 0, /'(x o) = cos(0) = 1, f"(x o) = — sin(0) = 0, and f"{x o) = — cos(0) = —1. Substituting 
this information into the formula for T 3 (x), we have 

T 3 (x) = 0 ± 1 ■ (x — 0) ± ^ • (x — 0) 2 ± ^ • (x — 0) 3 


Also from Taylor’s Theorem, we know R 3 {x) = ^ 4! ^ ( x — xo) 4 for any function / with enough derivatives. 
So we need to evaluate /^(x) at x = £. To that end, /^ 4 ^(x) = sin(x) so /( 4 )(£) = sin(£). Hence, 


i? 3 (x) 


sin(0 4 

24 ‘ 


lc: From Taylor’s theorem, T 3 (x) = J2k=o ^r^(x-x 0 ) k = /(x 0 )±/'(x 0 ) • (x - x 0 ) ± Qp 1 ■ (x - x 0 ) 2 ± f ^ o) ■ 
(x — Xo ) 3 for any function / with enough derivatives. So to find T 3 (x), we need to evaluate /, /', /", f" at 
xo = 7r. To that end, /(x) = sin(x), so f(x) = cos(x), /"(x) = — sin(x), and f'"(x ) = — cos(x). Therefore, 
/(x o) = sin(7r) = 0, f'(x o) = cos(7r) = —1, /"(x o) = — sin(7r) = 0, and /"'(xo) = — cos(7r) = 1. Substituting 
this information into the formula for T 3 (x), we have 

T 3 (x) = 0± (-1) • (x- tt) ± ^ • (x-tt) 2 ± ^ • (x-tt) 3 

1 , , 3 

= 7T — x H — (x — 7 r) . 

6 

Also from Taylor’s Theorem, we know i? 3 (x) = ^ 4! ^ (x — Xo ) 4 for any function / with enough derivatives. 
So we need to evaluate /^(x) at x = £. To that end, /^(x) = sin(x) so / (4) (0 = sin(£). Hence, 


i? 3 (x) 


sin(C) 

24 


(X — 7r) 4 


8: (a) 1 (b) 0.87760 (c) 0.54167 (d) 0.12391 

octave : 1> f =inline ( ’ l-x~2/2+x~4/24’ ) 

f = f(x) = l-x~2/2+x~4/24 

octave :2> f(0) 

ans = 1 

octave :3> f ( 1 / 2) 

ans = 0.87760 

octave:4> f ( 1 ) 

ans = 0.54167 

octave :5> f(pi) 

ans = 0 . 12391 
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10: taylorExercise .m: 


f =inline ( 1 l-x~2/2+x~4/24 ’ ) ; 
f (0) 
f (1/2) 
f (1) 
f (pi) 


Running taylorExercise .m: 


octave :1> taylorExercise 

ans = 1 

ans = 0.87760 

ans = 0.54167 

ans = 0 . 12391 


26 : (a) From Taylor’s theorem, T 2 (x) = Y?k= o /( k\ X °H x - x o) k = f( x o) + f(x o) ■ (x - x 0 ) + f • (x - x 0 ) 2 for 
any function / with enough derivatives. So to find T 2 (x), we need to evaluate /, f , and /" at x’o = 5. To 
that end, /(x) = i, so /'(x) = -4*, and /"(x) = Jr- Therefore, /(x 0 ) = f(x 0 ) = -±, and /"(x 0 ) = ^fg. 

Substituting this information into the formula for T 2 (x), we have 




1 x — 5 
5 25~~ 


(x — 5) 5 
125 


— — — — (x — Xq) 3 for any function / with enough derivatives. So we need 


(b) From Taylor’s Theorem, R 2 {x) = 
to evaluate f"' (x) at x = £. To that end, f "\x) = — \ so /"(£,) = — js 


Hence, 


R 2 {x) = 


-6/C 4 


6 

(x - 5) 3 


(x — 5) 3 


T 2 {1) = I - 1-5 


25 


(1-5) 2 = 1 
125 5 


A 

25 


16 = T 55 and /(9) « T 2 (9) = i— 9-5 


125 


25 


(9— 5) 2 1 

125 5 


4 

25 ' 


16 

125 


(c) /(l) 

21 
125 

(d) The bounds are 64 and /// respectively. According to Taylor’s Theorem, the absolute error |/(x) — T 2 (x)| = 
|i? 2 (£)| f° r some £ strictly between x and xo- So we can obtain a theoretical bound by bounding | (x) | over 

all values of £ between x and Xq. For x = 1, R 2 (x) = — = — ff. Hence, 1/(1) — T 2 (l)| < max — . 

4 4 fe[i,5] £ 4 

Since || is a decreasing function of £ over the interval from 1 to 5, its maximum value is obtained at £ = 1. 

64 

Finally, we can conclude |/(1) — T 2 (l)| < 64. Similarly, |/(9) — T 2 ( 9)| < max — . We get a much smaller 

£e[5,9] £ 4 

bound, though, since we are finding our bound over the interval from 5 to 9. |/(9) — T 2 (9)| < |f = 

(e) The bounds are and respectively. Just as we can find an upper bound on the absolute error, we 
can find a lower bound. The same analysis applies up to the point where we maximized the remainder term 
over an interval of £ values. The only change is that we now must minimize this function over the interval. 


64 


64 


So |/(1) — T 2 ( 1)| > min . and |/(9) — T 2 (9)| > min J/. Since §f is a decreasing function of £ over the 
«e[l,5] £ 4 56 [5,9] £ 4 4 

interval from 1 to 5 (and over the interval from 5 to 9), its minimum value is obtained at the right endpoint. 
So |/(1) - T 2 (l)| > U = |§ and |/(9) - T 2 (9)| > §| = Jy. 


(f) |/(1)-T 2 (1)| =]i- 

Q5RS 64 ^ 64 

(g) 


„ 125 1 = is =0-5120. Indeed 

.0568. Indeed ^ ^ < efeA 


< 64. |/(9) — T 2 (9)| = 


21 I 
125 I 


64 

1125 
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Solutions to Selected Exercises 



30b: Perhaps it may initially come as a surprise, but we do not need to find T±(x) in order to answer this question. 
The matter of error is entirely taken up by the remainder term. So we need only calculate R±{x). This does, 
however, require us to find the first 5 derivatives of f(x): 


/(*) = e -* 2 

/'Or) = —2xe~ x 

f"{x) = —2e~ x + (—2x)(—2xe~ x ) 

= 2(2a; 2 - l)e“ x2 

/"'Or) = 2[4xe~ x2 + (2x 2 - l)(-2xe~ x2 )} 

= —4(2x 3 - 3x)e~ x2 

f( 4 \x) = —4[(6x 2 — 3)e~ x + (2x 3 — 3x)(— 2xe~ x )] 

= — 4(— 4cc 4 + 12a; 2 - 3)e _x2 

f^ 5 \x) = — 4[(— 16a; 3 + 24x)e~ x + (— 4a; 4 + 12a; 2 — 3)(— 2xe~ x )] 
= — 8(4a; 5 - 20x 3 + 15a;)e“ x2 


Now, Ri(x) = * x 5 = 81 ' 4 ^ — 2 °i 2 o +15 ^^ e — x5 = Tb (4£ 5 — 20£ 3 + 15£) e ^ 2 . For any given value of x, we 

are faced with maximizing the absolute value of this expression over all £ between 0 and x. We may ignore the 
Y 5 factor which is independent of £, and focus on finding extrema of (4£ 5 — 20£ 3 + 15£) . Sometimes, at 

this point, the expression requiring optimization is easy enough to handle using standard calculus techniques — 
finding critical points and evaluating. However, in this case, that would involve finding the roots of a sixth 
degree polynomial. Ironically, techniques we will learn later in this course would be helpful right now, but as 
it is, we have no way to do that in general. The best we can do is have a look at a graph and hope it helps. 
Letting 5 (£) = (4£ 5 - 20£ 3 + 15£) , we proceed by graphing g(£): 
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With the goal of maximization in mind, it makes sense to take note of the relative extrema. The function 
appears to have 6 relative extrema and seems to approach zero as £ approaches ±oo. To confirm that these 
observations are facts, we start by calculating </(£) = — (8£ 6 — 60£ 4 + 90£ 2 — 15). Since a sixth degree 
polynomial has at most 6 distinct roots, g has at most 6 relative extrema. Since we can see 6 relative extrema 
on the graph, there are no others. Also, 

lim — (8£ 6 — 60£ 4 + 90£ 2 — 15) = 0 

£— >±oo 

since the exponential factor dominates the polynomial factor. We would possibly not have thought to consider 
these two facts if it were not for the graph. But there’s more. The graph appears to be odd. Again, we can 
verify that this is indeed the case: 

9(~ 0 = (4(-£) 5 -20(-£) 3 + 15(-£))e-(-« 2 
= — (4£ 5 - 20£ 3 + 15£)e- f2 
= - 9 ( 0 - 

Due to this symmetry, we may focus on finding extrema for positive values of £. And since we are ultimately 
interested in maximizing \g\, it is a good time to consider the graph of |g(£)| over £ € [0,4]: 



Finally, we can tackle the maximization. The relative maximum, marked with a red plus, will be the key to 
the answer. Let the coordinates of this point be (£,<?(£))■ Then, since |<?(£)| is increasing on the interval from 
0 to £, we can conclude that 

max |g(£)| = \g(x)\ = g(x) 
je[o,x] 

for all x between 0 and £. Moreover, 


max |g(£)| = |ff(£)| = ff(£) 
£e[o,x] 


for all x > £. By symmetry, we can conclude that max |c/(£)| = g(x) for x between — £ and 0, and 

Se[x,o] 

max |g(£)| = <?(£) for all x < — £. Putting it all together, 

«e[x,o] 


\Ta (x) - f(x ) | = |i? 4 (V)| < 


f lg(x) 
f 


if |x| < £ 
if |x| > £ 


Granted, we do not know the values of £ or g(£), but we can approximate them using a graphing calculator: 
(£>g(£)) ~ (.43607,4.0892). 
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Section 1.3 


lb: We need to find a such that lim^oo 
help: 


l/3 e 

(i/3=-r 


A for some A ^ 0. So, taking a close look at 


1/3= 


(1/3= 


should 


l/3 e 


(l/3 e ") c 



3 e " +1 

3 Qe " 


3 e " +1 
3 ae " 
3 e ' e " ' 


1/3 1/3= 

Consequently, if a = e, then = 1, from which if follows that limn-^ = 1. Therefore, the 

order of convergence is a = e. 


lc: We need to find a such that linin^oo 
should help: 


nTL-\- 1 

2 2 - 2 l 


p7T,+ 1 

2- -2 1 

2 2 " + 1 +3 J 

= A for some A / 0. So, taking a close look at 

2 2 " + 1 +3 J 


I 2 2 "+3 


I 2 2 " +3 


9 2 n + 1 0 

2 —2 -| 

pTl-|-l nn-\-X 

2- -2 2- +3 

2 2 " +1 +3 

2 2 " +1 +3 2 2 " +1 +3 

2 2n —2 -j a 

2 2 "+3 ^ 

2 2 ™ —2 2 2 "+3 0 

2 27l +3 2 27l +3 

-5 


2 2 " +1 +3 

= 5 

a 

-5 

2 2 "+3 

1 - a (2 2 "+3)“ 
2 2 " +1 + 3 ' 


If a = 2, the leading terms in both numerator and denominator of the resulting fraction will match. This is 
strong evidence that a — 2 is the right choice. Let’s try it: 

i_ 2 (2 2 " + 3) 2 1 2 2 ' 2 " + 6 • 2 2 " + 3 

2 2 " +1 +3 5 ' 2 2 " +1 + 3 

1 2 2 " +1 + 6 • 2 2 " + 3 
5 ' 2 2 " +1 + 3 

1 1 + 6 • 2 -2 " + 3 • 2- 2 " +1 
5 ' 1 + 3 • 2- 2 " +1 


In the last step, we have divided both numerator and denominator by 2 2 + to make taking the limit as n 
approaches oo simple: 


lim 

n—¥ oo 


r-t Ti “p 1 
2 2 -2 
2 2rl + 1 _(_3 


— 1 


2 2rl -2 _ i 

2 2 " +3 L 


So, the order of convergence is a = 2. 


1 1 + 6 • 2- 2 " + 3 • 2- 2 " +1 
5 ’ 1 + 3 • 2 _2?l+1 


1 

5 


6c: To begin, we are looking for a function of the form AA or the form Jj 2 that will be at least as great as for 

large n. In the end, though, we want the smallest such function (up to a constant). The key to the solution 
is to note that I sinnl < 1 for all n: 


sin n 



| sinn| 1 1 

y/n — yfn n 1 / 2 


Since this inequality will not hold for any higher power of n, the rate of convergence is O (— ^j). 
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6d: To begin, we are looking for a function of the form or the form \ that will be at least as great as 


lO^+SSn+Q 


for large n. In the end, though, we want the smallest such function (up to a constant). The key to the solution 
is to note that 10" + 35n + 9 > 10" for all n: 

4 

~9 “ 10"' 


10" + 35n + 9 


10" + 35n- 


Since this inequality will not hold for any base greater than 10, the rate of convergence is O (yjyr)- 


6e: To begin, we are looking for a function of the form or the form that will be at least as great as 


K 


lo^-asn-g 


for large n. In the end, though, we want the smallest such function (up to a constant). The key to the solution 
is dealing with the fact that 10" — 35ra — 9 < 10" for all n: 

4 


10" - 35n - 9 


2 • 10" - 70 n - 18 


10" + (10" - 70n - 18) 

8 

< 

“ 10 " 

for sufficiently large n since 10" — 70n — 18 > 0 for all large n. Since no similar inequality will hold for any 
base greater than 10, the rate of convergence is O (y^r). Notice we have the same rate of convergence as in 
question 6d even though we ended up with a larger constant. The rate of convergence is not dependent on 
the constant needed in the inequality. 

6k: To begin, we are looking for a function of the form c v or the form K „ that will be at least as great as ^ 
for large n. In the end, though, we want the smallest such function (up to a constant). Let 2 > e > 0 be 

2 i 2 -1 

arbitrary. Notice that ^ for large n by rearranging the inequality like so: ^ an d on ^ 

if n ' 2 < ( 2 -e) 71 if an< f only if n 2 < (2^7) ■ We knew this last inequality to be true for sufficiently large n 
because > 1; and exponential functions dominate polynomial functions. Hence, we can use any rate of 
convergence of the form O ^ ( 2 -^)^ ) 1 but there is no smallest such function. Hence, we are left simply using 
as the rate of convergence. 

13: One possible .m file is: 

for j=0:9 
disp(7~j) 
end“/ 0 f or 

15: One possible .m file is: 

f=inline ( ’ (2~(2~x)-2)/ (2~(2~x)+3) ’ ) ; 
n=[0, 1,2,4,6,10] ; 
for i=l:6 

disp(f (n(i) ) ) 
end“/ 0 f or 


19b: For a sequence with linear order of convergence, we know the number of significant digits increases by approx- 
imately — log A with each iteration, so we need to find the smallest k such that 1 — fclog(0.5) > 12. Solving 
the equation 1 — fclog(0.5) = 12 for k: 


1 - fclog(0.5) 
— fclog(0.5) 

k 


12 

11 

11 

- log(0.5) 


36.54. 


Therefore, it will take 37 iterations, using the rule of thumb. Remember, this estimate is only good as long as 
\ P \p +1 S p ^ ~ A. So, if the actual value of the ratio is significantly different from A, the estimate of 37 iterations 
could be significantly off. 
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Section 1.4 

8: (a) trominos.m may be downloaded at the companion website. 


7. trominosQ written by Leon Q. Brin 14 February 2013 7, 

7o is a recursively defined function for °L 

7o calculating the number of trominos needed to °L 

7o cover an n X n grid of squares, save one corner 7» 

7. INPUT: nonnegative integer n. 7« 

7. OUTPUT: T(n) 7. 

mmmmmmmmmmmmmmmmmmm 

function ans = trominos (n) 
if (n==0) 


ans = 0 ; 
else 

ans = l+4*trominos (n-1) ; 
end7«if 

end7«f unction 


(b) 


octave :1> trominos (10) 
ans = 349525 

9: (a) 7. Follow this sequence of moves: 




(b) i. Consider the following set of moves. 


This demonstrates that the 4-disk game can be completed by completing the 3-disk game twice (the first and 
last moves) plus one extra move (moving the bottom disk) . There is no quicker way to do it because the top 3 
disks must be moved off the bottom one before the bottom one can move. Then the bottom one must move, 
and must take at least one move. Then the three top disks must be put back on top of the bottom disk. Since 
we already know the minimum number of moves to move a stack of 3 disks, this diagram shows a minimum 
number of moves to complete the 4-disk game. 

ii. It takes a minimum of 2 • 7 + 1, or 15, moves to complete the 4-disk game. 
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10 : (a) One — -just move the disk to another peg. 

(b) hanoi .m may be downloaded at the companion website. 


mnmmmmraramramnmmmmramnra 


°/o hanoiO written by Leon Q. Brin 14 February 2013 7, 
% is a recursively defined function for °/ 0 
°i calculating the number of moves needed to 7, 
7. complete the Tower of Hanoi with n disks. °/ 
7, INPUT: positive integer n. 7» 
1 OUTPUT: H(n) / 


7 7 7 7. 7 7. 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 

/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o 


function ans = hanoi (n) 
if (n==l) 


ans = 1 ; 
else 

ans = l+2*hanoi (n-1) ; 

end7«if 

end7«function 


(c) 

octave :1> hanoi (10) 
ans = 1023 


12a: This is asking for the number of ways to partition a set of 10 elements into a single nonempty subset. There 
is only one way since there is only one subset allowed. That is, the “partition” contains just the set itself. So, 
5(10, 1) = 1. 

12d: This question is asking for the number of ways to partition a set of 4 elements into two nonempty subsets. 
As implied by the question, the actual elements of the set are immaterial, so we can work with any set of 
four elements and arrive at the correct answer. Consider the set {a, /3,j,S}. The list of all partitions can be 
categorized into those where one of the subsets has 1 element, one of the sets has 2 elements, or one of the 
sets has 3 elements. One does not have a partition of nonempty subsets if one of the sets contains 0 or 4 
elements. Here is the list of partitions where one of the sets has exactly one element: 

{{<*}, 1/3,7, <5}}, {{/3},{a,7,<3}}, {{7},{a,/M}}, {{A}, {a, (3, 7 }} 

Note that this is also the list of all partitions where one of the sets has exactly three elements. Here is the 
list of partitions where one of the sets has exactly two elements (and, therefore, the other set also has two 
elements) : 

{{a,/3},{7,<3}}, {{<T7},{/M}}, {{«, <5}, {/?, 7 }} 

There are no other partitions. Since we have listed 7 partitions, 5(4, 2) = 7. 

13: (a) S(n, 1) is the number of ways to partition a set of n elements into 1 nonempty subset. Of course, this is 1. 
The only such partition contains the set itself. 

(b) S(n,n) is the number of ways to partition a set of n elements into n nonempty subsets. Since the set 
contains only n elements and we need to divide them among n subsets, each subset of the partition must 
contain exactly one element, thus forming a partition of singleton sets. Since order does not matter in a 
partition, there is only one way to do this. Thus, S(n,n) = 1. 

16: 987. If we take a stack that is n — 1 inches high and add a block that is 1 inch high, we have a stack that is 
n inches high with the top block being 1 inch tall. If we take a stack that is n — 2 inches high and add a 
block that is 2 inches high, we have a stack that is n inches high with the top block being 2 inches tall. Any 
stack created by adding a 1-inch block to a stack that is n — 1 inches tall is necessarily different from a stack 
created by adding a 2-inch block to a stack that is n — 2 inches tall since the top blocks are different. Now, if 
we take all the stacks that are n — 1 inches high and add 1-inch blocks to them, we have all the stacks that 
are n inches high and have a 1-inch block on top. And if we take all the stacks that are n — 2 inches high 
and add 2-inch blocks to them, we have all the stacks that are n inches high and have a 2-inch block on top. 
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There are no other n-inch high stacks since any such stack will either have a 1-inch block or a 2-inch block on 
top. Therefore, the number of n- inch high stacks is just the number of (n — l)-inch stacks plus the number of 
(n — 2)-inch stacks. Of course, this doesn’t make sense for n = 1 or n = 2, so we need to specify that there 
is exactly 1 way to create a stack of blocks 1 inch high (one 1-inch block) , and there are exactly two ways to 
create a stack of blocks 2 inches high (two 1-inch blocks or one 2-inch block) . Now we can use the recursive 
answer to find out how many ways of building taller stacks. The number of 3-inch stacks is the number of 

2- inch stacks plus the number of 1-inch stacks, or 2 + 1 = 3. The number of 4-inch stacks is the number of 

3- inch stacks plus the number of 2-inch stacks, or 3 + 2 = 5. The number of 5-inch stacks is the number of 

4- inch stacks plus the number of 3-inch stacks, or 5 + 3 = 8. Continuing this way reveals the following table: 


n 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

number of 
n-inch stacks 

13 

21 

34 

55 

89 

144 

233 

377 

610 

987 


Section 2.1 

2 c: Since g is a polynomial, it is continuous on [0,0.9]. g(0) = 2 and ,g(0.9) = — .1897 so g has opposite signs on 
the endpoints of [0, 0.9]. Therefore, the Intermediate Value Theorem guarantees a root on the interval [0, 0.9]. 

2f: The discontinuities of g are at ±1 due to the (1 — t 2 ) factor in the denominator and at odd multiples of ^ 
due to the (tanf) factor in the numerator. None of these discontinuities occurs in the interval [21.5,22.5], so 
g is continuous on it. g( 21.5) ss 1.6 > 0 and g( 22.5) ~ —1.6 < 0 so g has opposite signs on the endpoints 
of [21.5,22.5]. Therefore, the Intermediate Value Theorem guarantees a root on the interval [21.5,22.5]. 
Incidentally, the discontinuities closest to [21.5,22.5] are l37r « 20.42 and 13 77 ss 23.56. 

3: There is no single correct table for executing the bisection method. Anything that shows successive choices of 
interval and accompanying computations will do. 

For g( x) = 3a: 4 — 2a; 3 — 3a; + 2 on [0, 0.9]: 


a 

g(a) 

b 

9(b) 

m 

sH 

0 

2 

.9 

-.1897 

.45 

.5907 

.45 


.9 


.675 

-.01731 

.45 


.675 


.5625 



The third iteration of the bisection method is 0.5625. 
For g(t) = jfe|g t on [21.5,22.5]: 


a 

g(a) 

b 

9(b) 

m 

g(m) 

21.5 

1.608 

22.5 

-1.676 

22 

-.02660 

21.5 


22 


21.75 

.7393 

21.75 


22 


21.875 



The third iteration of the bisection method is 21.875. 

11 : The error, | rrij —p\ < and we need this quantity to be less than or equal to 10 -3 . So we need to solve the 
inequality < 10 -3 for j. b — a = 4 — 1 = 3, so we need to find j such that Jy < 10 -3 : 

In (J^j < ln(10 -3 ) 

ln(3)-ln(2 J ) < -31n(10) 

ln(3) + 31n(10) < jln(2) 

ln(3) + 31n(10) 

M2) - J 

So we need j > M3)+3in(io) ^ i eas t integer satisfying this inequality is 12. We need 12 iterations. 
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21 : sin(4 2 ) = sin(16) < 0 and sin(5 2 ) = sin(25) < 0 so the assumptions of the bisection are not met on [4,5] as 
stated. However, if the bisection method is run anyway, the first iteration will be 4.5 and sin(4.5 2 ) > 0. No 
matter which endpoint (left or right) becomes 4.5, the assumptions of the bisection method will be met from 
here on. It will work as prescribed starting with the second iteration, and, therefore, will return a root. 


Section 2.2 


2c: (i) g does satisfy the hypotheses of the Mean Value Theorem on [0,0.9]. The hypotheses of the Mean Value 
Theorem require a function to be continuous on the closed interval [a, 5] and have a derivative on the open 
interval (a, b). In this question, a = 0 and b = 0.9. Since g is a polynomial, it is continuous over all real 
numbers. Therefore, g is continuous over [0,0.9] = [a, b\. Furthermore, g' is a polynomial and exists over all 
real numbers, so g has a derivative on (0,0.9) = (a, b). Remark: g actually satisfies the hypotheses of the 
Mean Value Theorem on any closed interval, as do all polynomials. 

(ii) We need to find c such that g'{c) = To begin, g'{x) = 12x 3 — 6x 2 — 3, g(0) = 2, and <?(0.9) = 

3(.9) 4 — 2(.9) 3 — 3(.9) + 2 = —.1897. So we need to solve 12c 3 — 6c 2 — 3 = for c: 


12c 3 - 6c 2 - 3 


12c 3 - 6c 2 


567 

1000 


-2433 

1000 

0 . 


We can not solve this equation using basic techniques of algebra since the cubic does not factor. However, we 
know the solution is between 0 and 0.9, so we can apply the bisection method to get an answer! Using Octave 
with a tolerance of 10~ 10 , we get 


ans = 0.622093084518565. 

2f: g does not satisfy the hypotheses of the Mean Value Theorem on [20, 23]. The discontinuities of g are at ±1 due 
to the (1 — t 2 ) factor in the denominator and at odd multiples of | due to the (tant) factor in the numerator. 
The discontinuity at « 20.42 is in the interval [20,23], so g is not continuous over the given interval. 

3h: We are asked to find the fixed points of h. By definition, a fixed point of h satisfies the equation h(x) = x, so 
we are looking for all such values. h{ x) = x — 10 + 3“ + 25 • 3 -a: so we need to solve x — 10 + 3 X + 25 • 3 -a: = x: 

x - 10 + 3* + 25 -3~ x = x 
— 10 + 3 X + 25 • 3~ x = 0 
3 X - 10 + 25 -3~ x = 0 
3 X ■ 3“ - 3 X ■ 10 + 3 X ■ 25 • 3 _:c = 0 
(3fy 2 — 10 • 3* + 25 = 0. 

(3a;) 2 _ 10 • 3® + 25 is quadratic in 3 X so we can try to factor. This quadratic does factor: 

(3*-5) 2 = 0 
3^ — 5 = 0 
3 X = 5 
log 3 3 X = log 3 5 
x = log 3 5. 


Therefore, there is one fixed point of h, x = log 3 5 « 1.465. 

4d: We are looking for roots of g( x) = x 2 — e 3x+4 , so we need to solve the equation x 2 — e 3x+ 4 = 0 for x. In 
order to do so with a fixed point method, we need to manipulate this equation into one of the form f{x) = x 
using algebra. The simplest way is to add x to both sides. This gives us x + x 2 — e 3x+4 = x, so we may take 
fi(x) = x + x 2 — e 3x+4 . Another way to transform the equation x 2 — e 3x+4 = 0 is to “solve” for the x in the 
x 2 term. Adding e 3x+4 to both sides, we have x 2 = e 3x+i and now applying the square root to both sides we 
have |x| = V e 3x + 4 or x = ±e^ 3a:+4 ^ 2 . We may now set fy^) = e^ 3x+4>> / 2 or /^(a;) = —e^ 3x+4 ^ 2 . 
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Remark: We can also “solve” for the x in the exponential: 


.2 _ g 3x+4 

= 0 

£ 2 

= e 3x+4 

In £ 2 

= ln(e 3x+4 

2 In x 

= 3£ + 4 

2 In £ — 4 

= 3x 

2 In £ — 4 

= X. 


This gives another candidate function, / 3 (x) = 21n g 4 . 

Remark: There are always infinitely many ways to turn the equation g(x) = 0 into an equation of the form 
f(x) = x. We can multiply both sides by any nonzero real number, c, and then add x to both sides. 
This gives the infinitely many candidates f c (x) = x + cg(x). 

Remark: See question 20 for another infinite set of candidates. 

5b: We are asked to calculate the first 5 iterations of the fixed point iteration method applied to g(x) = 10 + x — 
cosh(x) beginning with (initial value) Xq = —3. We have to apply g to xq, then apply g to the result to get a 
new result, then apply g to the new result to get a newer result, then apply g to the newer result to get yet 
another result, and so on, until we have 5 results: 


x 0 

xi = g{ x 0 ) 
X 2 = g(xi) 
x 3 = g{x 2 ) 
£4 = g{x 3) 
£5 = g(x 4 ) 


-3 

10 - 3 - cosh(— 3) « -3.067661995777765 
10 + £1 - cosh(xi) « -3.836725126419593 
10 + £ 2 — cosh(£ 2 ) ~ —17.03418648356706 
10 + £3 - cosh (£ 3 ) « -12497508.54310043 
10 + £4 — cosh(£ 4 ) ss ’floating point overflow’ 


So the first 5 iterations are (approximately) —3.067, —3.836, —17.03, — 1.249(10) 7 , and a floating point error. 
It does not look like fixed point iteration is converging on a fixed point. The numbers are getting larger in 
magnitude with each iteration. 


Remark: Calculators and computers using standard floating point arithmetic will not be able to calculate 
cosh(— 12497508.54310043) because it is too big! Thus the overflow. It does not mean it can not be 
calculated. It’s just too large for a floating point calculator. Using a computer algebra system with 
capability to handle such numbers, we find that 


£5 « — 4.97(10) 5427598 . 

£5 has over 5 million digits to the left of the decimal point! Indeed, the magnitude of each iteration is 
greater than the last. 

6 b: Using Octave with a properly programmed fixed point iteration function, we get the following: 


f ixedPointIteration(inline ( ’ 10+x-cosh(x) ’ ) ,-3, le-10, 100) 
ans = Method failed maximum number of iterations reached 

The method does not converge in 100 iterations. 

Remark: As we find out in question 5b, this iteration causes an overflow in just 5 iterations. 


7b: The web diagram will look something like this: 
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Remark: The line y = x is not set at a 45° angle because the aspect ratio of the graph is not 1 : 1. The 
y-axis covers a length of 20, from —20 to 0 while the x-axis covers a length of only 3, from —5 to —2. 


10 : (a) To establish that / has a unique fixed point on [—4, —.9], we will show that / is continuous on [—4, —.9], 
/([— 4, —.9]) C [—4, —.9] and \f'(x)\ < 1 for all x £ (—4, —.9). Proposition 3 gives us the result. 


(i) / is continuous on [—4, —.9] because its only discontinuity is at x = — where the denominator, 6x + 4, 

is zero, and — | « —.6666 is not in [—4, —.9]. 

(ii) We find the absolute extrema of / over [—4, —.9]. f(x) = +4$x+i6 = ^ ^ as zeroes a t 

x = — 1 and x = — | and is undefined at x = — |. The only relevant critical value is —1, so we check 
/(- 4) = -m = ~ 2 - 35 ’ /(- 1 ) = -!» and /(- 9 ) = -ii « -1021. Hence, /([-4, -.9]) C [-2.35, -1] C 
[—4, —0.9]. Remark: For many functions, we can be happy enough with visual evidence or at least use 
the graph to verify our conclusions. In this question, the graph of / for both x and y values from —4 to 
— .9 looks like 



The graph of the function does not leave the view through the top (no values greater than —.9) or the 
bottom (no values less than —4), so /([— 4, —.9]) C [—4, —.9]. 

(iii) We find the absolute extrema of f over [—4, —.9]. f”(x) = 27a 3 +54a 2 +36^+8 = (3a,+2) 3 ^ ias 110 zeroes an< i 
is undefined only at x = — |. There are no relevant critical values, so we check /'(— 4) = = 0.495 and 

/'(— .9) = — ss —.5204. Hence, — 1| < f'(x) < ^ for all x £ (—4, —.9), which means \f'(x)\ < || < 1 
for all x £ (—4, —.9). Remark: As with check (ii), we can be happy enough with visual evidence or 
at least use the graph to verify our conclusions. In this question, the graph of /' for x £ [—4, —.9] and 
y £ [—1,1] looks like 
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The graph of the function does not leave the view through the top (no values greater than 1) or the 
bottom (no values less than —1), so \ f'(x) \ < 1 for all x £ (—4, —.9). 

(b) Using the fixed point iteration method as described in the text with tolerance 10 -2 and xq = —4, we get 
Xq = —1.00000176319, and we presume this is accurate to within 10 -2 of the actual fixed point. Remark: 
Since we don’t have a dependable way to calculate the error, it is possible that the final answer will not be 
within tolerance of the actual root. In this case, though, the actual fixed point is — 1, so we are well within 
bounds. 

12: First, f(x) = s/8 — 4x = x => 8 — Ax = x 3 => x 3 + 4x — 8 = 0, so any fixed point of / is a root of g. It 
remains to show that the fixed point iteration method will converge to a fixed point of / for any initial value 
Xq £ [1.2, 1.5]. According to the Fixed Point Convergence Theorem, we need to establish that [1.2, 1.5] is a 
neighborhood of a fixed point in which the magnitude of the derivative is less than 1. 


(i) To establish that there is a fixed point in [1.2, 1.5], note that / is continuous and that /(1.2) — 1.2 = 

— 1.2 ss .27 > 0 and /(1.5) — 1.5 = \[2 — 1.5 ~ —.24 < 0. The Intermediate Value Theorem 
guarantees there will be a value c £ (1.2, 1.5) such that /(c) — c = 0, or /(c) = c. 

(ii) We need to establish that the magnitude of the derivative of / is less than 1 for all x £ (1.2, 1.5). 

f(x) = - 3(8— 4 x ) 2 / 3 and f "( X ) = - 9(8-4x)s/3 ~ SinCe /"W < 0 for &U X G t 1 - 2 - 1 ' 5 )’ we kn0W /' is 

decreasing over this interval. For this reason and the fact that f'(x) < 0 for all x £ (1.2, 1.5), we know 

2 \^2 


|/ / (s)| is bounded by |/'(1.5)| = 
This completes the proof. 


.84 < 1. 


Section 2.3 

5: Because there is no particular pattern to the values n is to take, we will store the six values in an array. Then 
we will loop over the array to get the values of /. 

n=[0,l, 2, 4,6,10] ; 

f=inline( ) (2~(2~x)-2)/ (2~(2~x)+3) ’) ; 

1 = 1 ; 

while (i<7) 

disp(f (n(i) ) ) ; 
i=i+l ; 
end%while 

produces the following output: 


0.285714285714286 

0.736842105263158 


0 


0.999923709546987 

1 

NaN 
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Remark: We can avoid the NaN, read “Not a Number”, on the sixth value by rewriting the function as the 
algebraically equivalent f=inline(’ (l-2*2~-(2~x))/(l+3*2~-(2~x)) ’) With this one change to the 
above program, the following output is produced: 

0 

0.285714285714286 

0.736842105263158 

0.999923709546987 

1 

1 

This works because 2~ (2~10), which equals 2 1024 , produces an overflow while 2~- (2~10), which equals 
2 -1024 , evaluates to 0. 2 1024 rts 1.8(10) 308 is too big to be represented as a standard floating point value. 


11 : 


(a) Proceeding according to proposition 5, we will need an initial error and a bound on the magnitude of the 
derivative of /. 

(i) All we know about the initial value, Xq, and the fixed point, x, is that they both lie in [—4, —.9], so 
the best we can do for an initial error is the width of the interval. Thus we take |a?o — x\ = 3.1. 

(ii) In 10 of section 2.2, we established the fact that |/'(a:)| < || < 1. Hence, we have M = ||. 

Therefore, we know \xk — x\ < 3.1 • (||) fc , an d we need this quantity to be less than 10 — 11 : 



k > 


10 


-n 


3.1(10) 


In 


ii 


3.1(10) 41 
-In (3.1(10) n 

W§§) 


40.51. 


Hence, 41 iterations will suffice for any initial value in [—4, —.9]. 

Remark: The inequality must switch from < to > in the last step because we are dividing by In (||), 
which is negative. 


(b) x 0 = -4, 

Xi = f(x o) = -2.35, 

x 2 = f(x i) « -1.541336633663366, 

23 = f{x 2 ) « -1.167517670666227, 

24 = /( 23 ) « -1.028014489100897, 

25 = /( 24 ) « -1.001085950365354, 

2 6 = /( 25 ) ~ -1.00000176318809, and 
x 7 = f(x 6 ) « -1.000000000004663. 

It takes 7 iterations to come up with an estimate within 10~ n of the actual fixed point, — 1. 


(c) The theoretical bound is 41 while the actual number of iterations is 7. The bound is nearly six times the 
actual! This is not a very tight bound. 


Remark: The reason the bound is so loose is because the derivative at the fixed point is zero. The 
estimate of proposition 5 does not account for this case where we know the convergence is quadratic 
or better. 


16a: 
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n 

Pn 

&TI 

0 

0.5 

0.2586844276 

1 

0.2004262431 

0.2576132107 

2 

0.2727490651 

0.2575358323 

3 

0.2536071566 

0.2575306600 

4 

0.2585503763 

0.2575303107 

5 

0.2572656363 


6 

0.2575989852 



20 : The tenth iteration of Steffensen’s method is 0.01462973293 while the eleventh is 0.009752946539, so it takes 
but 11 iterations to reach a number below 0.01. This is an incredible acceleration of convergence — from 29, 992 
iterations to 11. 


Section 2.4 


8 : Newton’s (fixed point iteration) method requires iteration of the function /(a) = x — so we need to know 

g'(x). The derivative of g is 

„ , 200sin(f) 1000cos(f) 

g (x) = — x x 


Therefore, 


and 


x , - 125 »( 125 > 

11 L2 ‘' <j'(1.25) 

9 {x i) 


x° 

100 c.j n 

1.25 2 “ V 1.25 ) 


200sin(d» 5 ) 1000008(1155) 

1.25 3 1.25 4 


2.76794916279264 


±™ s in(hA 

\ X 1 J 


200 sin 


(^) 


1000 cos 




3.07240930016243. 


Remark: Though it is not strictly needed in its simplified form, 


= T _ = 3a: 2 sin (~) + 10a cos 
' g'(x) 2asin(h’)+10cos(h>) 


Therefore, x\ = 


3-1. 25 2 sin( 1 35 s ) + 10(1.25) cos( 1 t “ 5 ) 


2(1.25) sin( 135 ) +10 cos( ^7^5 ) 


2.76794916279264, and a 2 may be computed using this 


expression as well. 

10a: The formula for the secant method is 


x n+ i =x n - g{x n ) - 


%n %n — 1 


g(x n ) ~~ g(Xn-l) ’ 
When n = 1, we get X 2 = x\ — g(x\) g ( x ^)Ig° Xo ) , so in this example, 

6 — 5 


X 2 = 6 - g( 6 )- 


5(6) -5(5) 

When n = 2, we get X 3 = X 2 — 3 (^ 2 ) gtx^-glxi) > so this example 

x 2 - 6 


10.15086029699136. 


x 3 = x 2 - g(x 2 ) 


g{x 2 ) - g( 6 ) 


8.43462052844703. 


18: Since Newton’s method is a fixed point iteration method, we may use the fixed point convergence theorem to 
find such an interval. As indicated in exercise 26 on page 55, though, we are guaranteed convergence over any 
neighborhood of the root where the iterated function / has a derivative with magnitude less than 1. To that 


end, /(a) =x - =x- 


x +2x A — x — 3 


. Hence, 


fix) = 1 - 


4x 3 +6x 2 — 1 

(4a 3 + 6a 2 — l) 2 — (12a 2 + 12a) (a 4 + 2a 3 — x — 3) 


(4a; 3 + 6a: 2 — l) 5 


(12a; 2 + 12a) (a 4 + 2a 3 — a — 3) 
(4a 3 + 6a 2 — l) 2 
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A graph of /' in the neighborhood of 1.097740792, 



seems to indicate that |/'(a)| < 1 for all x from just about 0.9 to oo. This is an acceptable answer, but if we 
would like to be more precise about the lower bound and prove our assertion, there is considerable work to 
do. First, the roots of 4a 3 + 6a 2 — 1 are around —1.4, —0.5, and 0.4, so there are no asymptotes in the interval 
under consideration, f is continuous there. To locate the lower end of this interval, we solve the equation 
/'(*) = - 1 : 

(12a 2 + 12x)(x 4 + 2x 3 — x — 3) 1 

(4a 3 + 6x 2 — l) 2 

(12a 2 + 12a)(a 4 + 2a 3 — a — 3) = — (4a; 3 + 6a; 2 — l) 2 

12 a; 6 + 36a; 5 + 24a; 4 - 12a; 3 - 48a; 2 - 36a; = -16a; 6 - 48a: 5 - 36a; 4 + 8a: 3 + 12a; 2 - 1 

28a: 6 + 84a; 5 + 60a: 4 - 20a: 3 - 60a; 2 - 36a; + 1 = 0. 

The real solutions of this equation are, in decreasing order, approximately 0.871748, 0.026590, —1.026590, 

and —1.871748. A graph of 28a 6 + 84a; 5 + 60a: 4 — 20a; 3 — 60a: 2 — 36a’ + 1 will point you in the right direction, 

and Newton’s method can be used to find these roots. The one we seek is 0.871748. This value marks the 
lower end of the desired interval. To verify that the interval is unbounded above, we solve f'(x) = 1; 

(12a 2 + 12a) (a 4 + 2a 3 — a — 3) ^ 

(4a 3 + 6a 2 — l) 2 

(12a 2 + 12a)(a 4 + 2a 3 — a — 3) = (4a 3 + 6a 2 — l) 2 

12 a 6 + 36a 5 + 24a 4 — 12a 3 — 48a 2 — 36a = 16a 6 + 48a 5 + 36a 4 — 8a 3 — 12a 2 + 1 

0 = 4a 6 + 12a 5 + 12a 4 + 4a 3 + 36a 2 + 36a + 1. 

The real solutions of this equation are, in decreasing order, approximately —0.028593 and —0.971407. Again, 
a graph will point you in the right direction, and Newton’s method can be used to find these roots. There 

are no solutions of /'(a) = ±1 greater than the root 1.097740792. We conclude that \f'(x)\ < 1 for all 

a € (0.87175, oo), so Newton’s method will converge to a « 1.097740792 for any initial value in (0.87175, oo). 
Finally, by looking at the graph of /(a), 
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we see that the interval from the asymptote around 0.4 to the root maps into the interval from the root to 
infinity. Therefore, Newton’s method converges to 1.097740792 for all initial values between the asymptote 
near 0.4 to 0.87175 as well. Finally, we use Newton’s method to get a more accurate value for the asymptote 
near 0.4. It turns out to be 0.366025403784439, so we conclude that Newton’s method will converge to the 
root x ~ 1.097740792 for any initial value in (0.36602540378444, oo). 

Remark: Depending on how rigorously you want your answer shown, you may start with the graph of / as 
above, approximate the asymptote near 0.4, and proceed straight to the final answer. This conclusion 
can be justified (graphically) by assuming that the graph of / is more or less linear to the right of the 
part shown and imagining the web diagram for any value in this interval. To make this argument slightly 
more rigorous, note that / has a slant asymptote, y = |x, as x approaches oo, so the assumption that 
the graph of / is more or less a straight line to the right of the part shown is valid. 


21 : 



26 : The sum of two numbers, call them x and y, is 20, so x + y = 20. If each number is added to its square root, 
the product of the two sums is 172.2, so (x + y/x){y + y/y) = 172.2. Hence, we need to solve the system 

x + y = 20 

(x + Vx){y + y/y) = 172.2 

of two equations with two unknowns. Since this system is not linear, our best hope is to use substitution. 
The first equation gives us y = 20 — x. Substituting this value of y in the second equation gives us 

(x + Vx)(20 -x+ V20 - x) = 172.2 

or (x + y/x)(20 — x + \/20 — x) — 172.2 = 0. It is a solution of this last equation we seek. Without having 
any idea what the roots might be besides the reasonable assumption that they are between 0 and 20, it is 
not clear what initial values to use. With a few different attempts, you are likely to find some that work. 
For example, applying the secant method to g(x) = (x + y / 2?)(20 — x + ^20 — x) — 172.2 with x 0 = 9 and 
x\ = 10 gives 9.149620618, which is accurate to all digits shown, in just 9 iterations. The other number is 
20 — 9.149620618 = 10.850379382. We can verify this is a solution by calculating 

(9.149620618 + V9.149620618)(10.850379382 + VlO.850379382) 

which is very nearly 172.2. 

27 : Newton’s method will fail to find a root of g on the second iteration if g'(x i) = 0. For example, let g(x) = 
x 3 — 3x + 3. Then g'(x) = 3x 2 — 3 has zeroes when x = ±1. So we need a value xq such that x\ = 1 or 
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X\ = —1. We need to find any solution of xi = Xq — = x — x 3x 2 °lt 3 = ±1- One such solution follows. 


x — 


x 3 — 3x + 3 
3.t 2 - 3 
2x 3 — 3 


3x 2 — 3 
2s 3 -3 
2x 3 — 3x 2 
x 2 (2x - 3) 


1 

1 

3x 2 — 3 

0 

0 


so either of the initial values Xq = 0 or xq = | will produce the desired result. 


Remark: The equation x — x 3 ~f x ^ 3 = — 1 has only one real solution, but it is irrational. It is, accurate to 20 
significant digits, 1.0786168885087585968. Setting xq = 1.078616888508759 as in the following Octave 
code does not fail, however! There is enough round-off error that x\ is not exactly —1 and g'(x \ ) is not 
exactly zero, so the method proceeds to find the result. It takes 99 iterations to settle in on the solution, 
but it gets there. x-\ displays as -0.999999999999999 and x-± displays as 7 . 50599937895082e+14. 


format! ’ long’ ) 
f =inline ( ’x~3-3*x+3 1 ) 
fp=inline ( ’ 3*x~2-3 1 ) 
xO=l . 0786168885087585968 
c=l; 

for i=l : 120 

x=x0-f (x0)/fp(x0) 
if (abs(x-x0)<le-15) 
c 

return 
end“/ 0 if 
x0=x; 
c=c+l ; 
end“/ 0 f or 


Section 2.5 

1: Before trying to match any functions with their diagrams, we take stock of the functions available. / and h 
are polynomials of degree 5 and, therefore, have at most 5 distinct roots. I is the product of the natural 
logarithm with a third degree polynomial. The polynomial has three roots and the logarithm has one distinct 
from those of the polynomial, so l has four roots. Now looking at the diagrams, we can match two functions 
with their diagrams. Diagram (d) has patches of nine different colors, indicating nine roots within the area 
shown. Since functions /, h, and l have fewer than 9 roots, function g must match with diagram (d). Along 
the same lines, diagrams (a) and (b) both show 5 roots, so l can not match either of those. I has only four 
roots. By process of elimination, function l matches with diagram (c). That leaves (a) and (b) to match with 
/ and h. Both diagrams show 5 roots, but there is a fundamental difference between the two. The real axis 
passes horizontally through the center of each diagram. Diagram (a) has one patch covering the entire real 
axis, indicating only one real root while diagram (b) has three patches covering the real axis, indicating three 
real roots. The graph of /, 
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clearly shows that / has three roots, so / matches with (b) and h matches with (a). To recap, 

/ *+ (b) 

9 ** (d) 

h (a) 

l (c). 

3c: For each root r, the polynomial must have a factor of (x — r) and no other factors. This polynomial must have 
factors of (a;— (—4)), (x— (— 1)), (x—2), (x—2 i), and ( — (— 2z)), making p(x) = (x+4)(x+l)(x—2)(x—2i)(x+2i) 
one solution. 

Remark: q( x) = a{x + 4) (a; + l)(a: — 2) (a; — 2*) (a; + 2%) where a is any nonzero complex number is another 
solution. 

Remark: Though it is not necessary to multiply the factors, p(x) = x 5 + 3a; 4 — 2x 3 + 4a: 2 — 24a:— 32. 

3d: For each root r, the polynomial must have a factor of (a; — r) and no other factors. This polynomial must have 
factors of (a;— (—4)), (a;— (— 1)), (a: — 2), and (x — 2i), making p(x) = (a: + 4) (a; + 1) (a; — 2) (a: — 2 i) one solution. 

Remark: q{ x) = a(x + 4) (a: + l){x — 2) (x — 2 i) where a is any nonzero complex number is another solution. 
Remark: Though it is not necessary to multiply the factors, p(x) = a; 4 +(3— 2i)x 3 — (6+6*)x 2 — (8— 12i)x+16i. 
Notice that not all the coefficients are real numbers. This is consistent with the conjugate roots theorem 
stating that if a polynomial with real coefficients has complex roots, they must come in conjugate pairs. 

7: / is periodic and has infinitely many roots regularly spread across the real axis. The only diagram showing roots 
of this nature is (a) so / matches with (a), g and / differ only by a small amount for large real values so we 
should expect to see infinitely many more or less regularly spaced roots on the positive real axis. The only 
diagram with roots of this nature is (d) so g matches with (d). I is a fifth degree polynomial so has at most 
5 roots. Diagram (b) shows 8 colors so 8 roots. Therefore, h matches with (b) and l matches with (c). To 
recap, 

/ (a) 

9 (d) 

h o (b) 
l (c). 


2 | 3 
2 |~3 
3 


12 

6 

18 

6 

24 


-13 -8 


36 

23 

48 

71 


46 

38 


Section 2.6 

6a: g( 2) = 38 and g'{ 2) = 71: 
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8a: From 6a, g{2) = 38 and g'( 2) = 71, so X\ = 2 — || = g{^t ) 


2911104 

357911 


and 


209027 . 
5041 ’ 


^ 3 12 —13 -8 


312 121056 5774392 
JU 5041 357911 


o 1164 

55523 

2911104 

o 71 

5041 

357911 

312 

153504 


71 

5041 


9 1476 

209027 


° 71 

5Q41 



104 _ 2911104/357911 _ 2689672 
bu J ' 2 71 209027/5041 2120131 


1.268634815490175. 


14a: newtonhorner ( [-144, 144,-59,6, 1] , 1 , le-5, 100) returns ans 


3. 


3 | 1 6 -59 

3 27 

1 9 -32 


144 

-96 

~ 48 ~ 


-144 

144 

— 0 - 


so the deflated polynomial is a; 3 + 9a: 2 — 32a: + 48. newtonhorner ( [48 , -32 ,9,1] , 3 , le-5 ,100) returns ans 

= - 12 . 


-12 1 


1 


9 -32 48 


-12 


36 

4 


-48 

0 


so the deflated polynomial is x 2 — 3a; + 4 which is quadratic. The quadratic formula gives the remaining roots, 

3± ^9—4(4) = 344\/7 and 3=1^7 Tq the four roots &re 3 _ 12) 3+1*4, ^7 

15a: format( ’ long 1 ) ; c= [-40 , 16 , -12 , -2 , 1] ; newtonhorner (c , 1 , le-5 , 100) returns 

ans = -3.54823289798023 


so —3.54823289798023 is one root. c=def late (c , ans) returns 


-11.27321716194279 7.68642249426964 -5.54823289798023 1.00000000000000 

so the deflated polynomial is approximately a; 3 — 5.5482a; 2 + 7. 6864a; — 11.2732 and the coefficients of this poly- 
nomial are now contained in array c. newtonhorner (c, -3. 5, le-5, 100) returns ans = 4.38111344099655 
so 4.38111344099655 is another root. c=deflate(c,ans) returns 

c = 


2.57313975402986 -1.16711945698368 1.00000000000000 

so the deflated polynomial is approximately x 2 — 1.1671a: + 2.5731 and the coefficients of this polynomial are 
now contained in array c. Since we have deflated the polynomial to a quadratic, we find the last two roots 
using the quadratic formula. [s,t]=quadraticRoots(c(3) ,c(2) ,c(l)) returns 

s = 0.583559728491838 + 1 . 494188006012761i 

t = 0.583559728491838 - 1 . 494188006012761i . 

To recap, the roots are 

-3.54823289798023 

4.38111344099655 

0.583559728491838 + 1.494188006012761* 

0.583559728491838 - 1.494188006012761*. 
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16a: c= [-40, 16 , -12 , -2, 1] ; newtonhorner (c , -3 . 54823289798023 , le-5 , 100) 

returns 


ans = -3.54823289797970. 

c= [-40 ,16,-12,-2,11 ; newtonhorner (c ,4 . 38111344099655 , le-5 , 100) 

returns 


ans = 4.38111344099594. 

c— [-40 ,16,-12,-2,11; newtonhorner (c , 0 . 583559728491838+1 . 49418800601276H , le-5 , 100) 

returns 

ans = 0.583559728491879 + 1 . 494188006011256i. 

c— [-40 ,16,-12,-2,11; newtonhorner (c , 0 . 583559728491838-1 . 49418800601276H , le-5 , 100) 

returns 

ans = 0.583559728491879 - 1 . 494188006011256i. 

Each attempt to refine the roots returns a slightly different answer, but none change within the first five 
decimal places. The approximate roots of the approximate deflated polynomials are all within 10 -5 of the 
exact roots of the original polynomial without refinement. 

19a: (i) format (’ long’ ) ; horner (sqrt (3) , [-40 , 16 , -12 , -2 , 1] ) returns ans = -49 . 6794919243112. Notice we 
only get the value of the polynomial, the first entry of the array of return values. This is the default behavior 
if the function is not set equal to an array. 

(ii) p=inline( ’x~4-2*x~3-12*x~2+16*x-40’ ) ; p(sqrt(3) ) returns ans = -49 . 6794919243112 so they cer- 
tainly look like they are returning the same value. 

(iii) horner(sqrt(3) , [-40, 16, -12, -2, 1] ) == p (sqrt (3) ) returns ans = 0, however, so internally, the re- 
sults are not exactly the same! We can conclude that the inline function evaluation is not done by nesting 
(synthetic division). 

Remark: horner(3, [-40, 16, -12, -2, 1] ) == p(3) returns ans = 1, so for the integer input 3, the two 
methods do result in exactly the same value. 


Section 3.2 


3c: We begin by constructing three polynomials — the first with roots at the second two data points and a value 
of 1 at the first, the second polynomial with roots at the first and third data points and a value of 1 at the 
second, the third polynomial with roots at the first and second data points and a value of 1 at the third. 
Those polynomials are 

{x — 20) (a; — 1019) 

(—10 — 20)(— 10 — 1019) 

{x + 10)(a: - 1029) 

(20+ 10)(20 - 1029) 

{x + 10) (a; — 20) 

(1019 + 10)(1019 — 20) 


h(x) = 
h{x) = 
h{x) = 


We then multiply k by yt and sum the products: 


P 2 (x) 


(x — 20)(a; — 1019) (x + 10)(x - 1019) 

(-10 - 20)(— 10 - 1019) 1 1 (20 + 10)(20 - 1019) 1 ’ 

(x + 10) (a: — 20) 

(1019 + 10)(1019-20) 1 
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4c: Estimating (or approximating) the value of a function / using an interpolating polynomial means to evaluate 
the polynomial there instead. 


, , , , (1.3 -20)(1.3- 1019) , , 

/ (1-3) « P 2 (1.3)= / in J in im ;, (10) 


+ 


(—10 — 20)(— 10 — 1019) 
(1.3 + 10)(1.3 — 20) 

(1019 + 10)(1019 — 20) ’ 


(1.3 + 10)(1.3- 1019) 
(20 + 10)(20 - 1019) 


(58) 


28.427 


5c: Neville’s method is best executed on a computer or in a tabular format. / (1.3) « Pop(1.3). The tabular format 
is shown here: 


Xi 

II 

O 

oif 

Pi,i 

Pi, 2 

-10 

10 

28.08 

28.427 

20 

58 

59.684 


1019 

-32 




Pop 
Pip 
Po,2 

7: Since the interpolating polynomial error term contains the product ( x — Xo)(x — ap) • • • (x — x n ), we should 
choose data near the point of estimation x. This way, the product is minimized and we arrive at what is 
likely to be the best approximation possible with the given data. It does not always work this way (perhaps 
it would make a good exercise to find an example where using the data nearest the point of estimation does 
not give the best estimate) but we have the best chance of good results this way. For the degree at most 1 
polynomial, we will use the data at 2 and 3.5 since these are the two abscissas nearest 3. For the degree at 
most 2 polynomial, we will use the data at 2, 3.5, and 4 since these are the three abscissas nearest 3. For the 
degree at most 3 polynomial we have no choice but to use all of the data. Here is where Neville’s method 
shines! The first estimate uses the first two data points. The second estimate uses these same two plus a 
third. The last estimate uses these three plus a fourth. We can reuse each of the first two calculations in the 
next by creating a single Neville’s method table. With the data in the table in the order in which we would 
like to use them, we get 


(1.3 — 20)(10) — (1.3 + 10)(58) 
(- 10 - 20 ) 

(1.3 - 1019)(58) - (1.3 - 20)(— 32) 
(20 - 1019) 

(1.3 - 1019)P o ,i - (1.3 + 10)Pi,i _ 
(-10 - 1019) 


= 28.08 


= 59.684 


28.427 


Xi 

II 

CD 

Pi, i 

Pi, 2 Pi, 3 

2 

.8 

.73 

.6916 .638 

3.5 

.7 

.65 

.53 

4 

.75 

1 


5 

.5 




Pop gives the at most degree 1 estimate. Pop gives the at most degree 2 estimate, and Pop gives the at most 
degree 3 estimate. 

(a) Pop (3) = (3— 3.5)(+)-(3 — 2)(.7) = Q 73 

(b) Pip (3) = (3- 4 )( -7) 3 -(3 - 3 . 5) (. 75) = 0 65 . p 0 2(3) = (3— 4) (.73) = (3— 2) (.65) = 6 g lg 

(c) p 2 p( 3 ) = ( 3 - 5 )( - 7 t~ (3 - 4)( - 5) = 1; Pi, 2 (3) = ( 3 - 5 )( ' 63 ) -J 3 - 3 - 5)(1) = . 53 ; P 0 , 3 ( 3 ) = (3-5)(.6916)-(3-2)(.53) = 

.638 

8b: Since the interpolating polynomial error term contains the product (x — :ro)(a: — ap) ■ • • (x — x n ), we should 
choose data near the point of estimation x. This way, the product is minimized and we arrive at what is 
likely to be the best approximation possible with the given data. It does not always work this way (perhaps 
it would make a good exercise to find an example where using the data nearest the point of estimation does 
not give the best estimate) but we have the best chance of good results this way. For the degree at most 1 
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polynomial, we will use the data at .1 and .2 since these are the two abscissas nearest .18. For the degree at 
most 2 polynomial, we will use the data at .1, .2, and .3 since these are the three abscissas nearest .18. For 
the degree at most 3 polynomial we have no choice but to use all of the data. Here is where Neville’s method 
shines! The first estimate uses the first two data points. The second estimate uses these same two plus a 
third. The last estimate uses these three plus a fourth. We can reuse each of the first two calculations in the 
next by creating a single Neville’s method table. With the data listed in the Octave function in the order in 
which we would like to use them, we get 

» nevilles(. 18, [. 1, .2, .3, .4] , [-. 29004986 ,-. 56079734 ,-. 81401972, -1 . 0526302] ) 
ans = 


0.290049860000000 

0.560797340000000 

0.814019720000000 

1.052630200000000 


-0.506647844000000 

-0.510152864000000 

-0.527687144000000 

0.000000000000000 


-0.508049852000000 

-0.508399436000000 

0.000000000000000 

0.000000000000000 


-0.508143074400000 

0.000000000000000 

0.000000000000000 

0.000000000000000 


For the interpolating polynomial of degree at most one, /(-18) « Pop (.18) = —.506647844. For the interpo- 
lating polynomial of degree at most two, /(.18) ~ P 0 ,2(.18) = —.508049852. For the interpolating polynomial 
of degree at most three, /(. 18) ~ Po,3(.18) = —.5081430744. 

8c: Since the interpolating polynomial error term contains the product ( x — Xo)(x — X\) ■ ■ ■ (x — x n ), we should 
choose data near the point of estimation x. This way, the product is minimized and we arrive at what is 
likely to be the best approximation possible with the given data. It does not always work this way (perhaps 
it would make a good exercise to find an example where using the data nearest the point of estimation does 
not give the best estimate) but we have the best chance of good results this way. For the degree at most 1 
polynomial, we will use the data at 2 and 2.5 since these are the two abscissas nearest 2.26. For the degree 
at most 2 polynomial, we will use the data at 2, 2.5, and 1.5 since these are the three abscissas nearest 2.26. 
For the degree at most 3 polynomial we have no choice but to use all of the data. Here is where Neville’s 
method shines! The first estimate uses the last two data points. The second estimate uses these same two 
plus a third. The final uses these three plus a fourth. We can reuse each of the first two calculations in the 
next by creating a single Neville’s method table. With the data listed in the Octave function in the order in 
which we would like to use them, we get 

» nevilles (2.26, [2 ,2 . 5 , 1 . 5 , 1] , [-1.329,1.776,-2.569,1.654]) 
ans = 


1.32900 

1.77600 

2.56900 

1.65400 


0.28560 

0.73320 

-8.98796 

0.00000 


0.05285 

-0.82219 

0.00000 

0.00000 


0.28036 

0.00000 

0.00000 

0.00000 


For the interpolating polynomial of degree at most one, /( 2.26) « Po i i(2.26) = —.28560. For the interpolating 
polynomial of degree at most two, f( 2.26) « Pn 2(2.26) = .05285. For the interpolating polynomial of degree 
at most three, /( 2.26) « P 0 , 3 (2.26) = .28036. 

9a: Since the interpolating polynomial error term contains the product ( x — Xq){x — xi) ■ ■ ■ (x — x n ), we should 
choose data near the point of estimation x. This way, the product is minimized and we arrive at what is likely 
to be the best approximation possible with the given data. It does not always work this way (perhaps it would 
make a good exercise to find an example where using the data nearest the point of estimation does not give 
the best estimate) but we have the best chance of good results this way. For the degree at most 1 polynomial, 
we will use the data at 1.25 and 1.6 since these are the two abscissas nearest 1.4. For the degree at most 
2 polynomial, we have no choice but to use all of the data. We can use Neville’s method or the Langrange 
form in this case. Neither method provides obvious advantage over the other. To begin, /( 1) = sin7r = 0; 
/(1.25) = sin 1.257T « -.70711; /(1.6) = sin(1.67r) « -.95106. 


Lagrange form: (degree at most 1) L i{x) = i 25-1 6 (—-70711) + 25 (—.95106) so /(1.4) ~ Li(1.4) = 


1.4- 1. 6 

1.25 


^(-.70711) + M 5 ^| ( _. 95 io6) = -.81166. 


(degree at most 2) L 2 (x) = (°) + (i. 25 -i)(i. 25 -i. 6 ) (-- 70711 ) + (n 6 -lj(i. 6 -i° 25 ) (-- 95106 ) so 

/(1.4) « L 2 (1.4) = (--70711) + (--95106) = -.918232. 


Q — 1)Q— 1.6) 


(a— 1)Q — 1.25) 


(1.25— 1)(1.25— 1.6) 


(1.6 — 1)(1.6 — 1.25) ' 
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Neville’s Method: We use the same table for both the degree at most 1 and degree at most 2 polynomials: 


Xi 

II 

O 

Pi, 1 Pi, 2 

1.25 

-.70711 

.16414 - .697cc 3.5524a; 2 - 10.82134x + 7.26894 

1.6 

-.95106 

1.5851 - 1.5851a; 

1 

0 



Po,i(x) 

P oA X ) 


{x - 1.6)(— .70711) - {x - 1.25)(— .95106) 
1.25-1.6 
(x- 1)(-.95106) 

1 . 6-1 

(x - l)P 0 ,i(x) - (x - 1.25)Pi,i(x) 
1.25-1 


= .16414 - .697a; 


= 1.5851 - 1.5851a; 


= 3.5524a; 2 - 10.82134a; + 7.26894 


(degree at most 1) Po,i(1.4) = .16414 — .697(1.4) = —.8166 

(degree at most 2) P 0 | 2 (l-4) = 3.5524(1.4) 2 - 10.82134(1.4) + 7.26894 = -.918232 

10a: (degree at most 1) /(1.4) — Pi(1.4) = - — ^^(1.4 — 1.25)(1.4 — 1.6) so our bound is 

1/(1. 4) — Pi(1.4)| < .015 max |7r 2 sin7rx| 

xE[1.25,1.6] 

= .0157T 2 |sin(1.57r)| 

< .149 


The actual absolute error is |/(1.4) — Pi (1.4) | = | sin(1.47r) + .8166| ss .134, which is rather near the bound, 
(degree at most 2) /(1.4) — P2(1.4) = - — (1.4 — 1.25)(1.4 — 1.6)(1.4 — 1) so our bound is 

1/(1. 4) — P 2 (1.4)| < .002 max |7r 3 cos7ra;| 

se[i,i.6] 

= ,002tt 3 

< .0620 

The actual absolute error is |/(1.4) — P 2 ( 1.4) | = | sin(1.47r) + .918232| ss .0328, which is of the same order of 
magnitude as the bound. 


Section 3.3 

4: The Newton form of an interpolating polynomial follows from a table of divided differences. Recursion 3.3.3 is 
used to compute the entries in the table, as in Table 3.3. Answers will depend on the order in which the data 
are listed in the table and on how the data are read from the table. Placing the data in the table in the order 
given in the question, we have: 


fi.O fi,l fi, 2 /i, 3 

T 2 6 M 2/3“ 

2 2 -2 1 

3 0 0 

4 0 

Reading the coefficients across the first row, we use /o,o, /o,i, /o, 2 > and /o, 3 - This is a valid sequence to read 
from the table since each coefficient depends on the same data as the previous plus one point, /o^ depends 
on Xo; /op depends on Xq and ap; /o, 2 depends on Xq,Xi, and a; 2 ; and /o ,3 depends on a;o,a;i,a; 2 , and x$. 
Therefore, one answer is 

= 2 + 0(x — 1) — l(a; — l)(x — 2) + — (a; — l)(x — 2)(x — 3) 

O 

= 2 — (x — l)(x — 2) + -(x — 1) (a? — 2) (a; — 3). 

O 


p oA x ) 
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The sequence of coefficients fi t o, / 2 ,i, /i, 2 i /o ,3 is not a valid sequence to choose. fi Q depends on aq but 
/ 2 ,i depends on X 2 and X 3 , two completely different data values from the first. With some study, you might 
be able to draw the conclusion, and maybe even prove, that any sequence of coefficients starting in the first 
column and progressing to the right one column at a time and either jumping up one row or remaining in 
the same row with each change of column forms a valid sequence. For example, we can use coefficients f 2i o, 
/l.ij /i, 2 ; /o ,3 because f 2i o depends on aq; /i,i depends on aq and 3 q; ft .2 depends on X 2 ,Xi , and aq; and 
/oq depends on X 2 ,Xi,X 3 , and Xo- And the order in which new dependencies are encountered matters. The 
(a; — Xi ) monomials must appear in the same order. Therefore, another answer is 

P 0 , 3 (x) = 0-2(x-3) + l(x-3)(x-2) + ^(x-3)(x-2)(x-4) 

= — 2(x — 3) + (a: — 3) (x — 2) + -(a; — 3) (a; — 2) (a; — 4). 

O 

Other possible answers garnered from this same divided difference table are 
Po, 3 (*) = (:r-4)(x-3) + ^(x-4)(x-3)(x-2) 

Po, 3 (x) = — 2(x — 3) — (x — 3)(x — 2 ) + — (x — 3)(x — 2) (a: — 1). 

O 

With some algebra and a bit of patience, each of the four forms above can be reduced to 

Po, 3 ( x ) = ^x 3 - 5x 2 + yX - 4. 

6 : Recursion 3.3.3 is used to compute the entries in the table, as in Table 3.3. Answers will depend on the order in 
which the data are listed in the table and on how the data are read from the table. Placing the data in the 
table in the order given in the question, we have 


Xi — f{Xi) fj . 1 /i, 2 

I J87 ^~925 .809375 

2.2 —.123 .69375 

3 .432 

Reading the coefficients across the first row, we use fo t o, /o,i, and /o, 2 - This is a valid sequence to read from 
the table since each coefficient depends on the same data as the previous, plus one point. /q,q depends on xo; 
/o,i depends on Xq and xi; and /o ,2 depends on xo,Xi, and aq. Therefore, one answer is 

Po, 2 {x) = .987 — .925(x — 1) + .809375(x — l)(x — 2.2). 

The sequence of coefficients /o,o, /i,i, /i ,2 is not a valid sequence to choose. /o,o depends on Xq but /ij 
depends on X\ and x 2 , two completely different data values from the first. Not to mention / 12 , which is 
not even part of the table. With some study, you might be able to draw the conclusion, and maybe even 
prove, that any sequence of coefficients starting in the first column and progressing to the right one column 
at a time and either jumping up one row or remaining in the same row with each change of column forms a 
valid sequence. For example, we can use coefficients /q 0 , /o,i, /o ,2 because /i ) 0 depends on x ,\ ; /11 depends 
on X\ and x 2 \ and fo_ 2 depends on xi,x 2: and Xq. And the order in which new dependencies are encountered 
matters. The (x — xf) monomials must appear in the same order. Therefore, another answer is 

P 0 , 2 (x) = —.123 — .925(x — 2.2) + .809375(x — 2.2)(x — 1) 

The other two possible answers garnered from this same divided difference table are 

P 02 (x) = —.123 + .69375(x — 2.2) + .809375(x — 2.2)(x — 3) 
p 02 (x) = .432 + .69375(x — 3) + .809375(x — 3)(x — 2.2). 

With some algebra and a bit of patience, each of the four forms above can be reduced to 

Po, 2 (x) = .809375x 2 - 3.515x + 3.692625. 
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10: Answers will depend on the order in which the data are listed in the Octave call and on how the data are read 
from the table. Placing the data in the Octave command in the same order they are listed in the question, 
your Octave code should produce something like 

dividedDiff s ( [0, .1, .3, .6,1] , [-6,-5.89483,-5.65014,-5.17788,-4.28172]) 
ans = 


-6.00000 

1.05170 

0.57250 

0.21500 

0.06302 

-5.89483 

1.22345 

0.70150 

0.27802 

0.00000 

-5.65014 

1.57420 

0.95171 

0.00000 

0.00000 

-5.17788 

2.24040 

0.00000 

0.00000 

0.00000 

-4.28172 

0.00000 

0.00000 

0.00000 

0.00000 


One possibility for the interpolating polynomial of degree (at most) four is 

P 0 A (a :) = —6 + 1.05170a; + .5725a:(x — .1) + .215 ie(:e — .l)(:r — .3) 
+.06302x(a: - .l)(a; - .3)(x - .6). 


See discussion of question 4 above for other possibilities. Adding the point (1.1, —3.9958) to the table, we get 
(accurate to 5 decimal places) 


/s.o 
/ 4,1 

/3,2 

/2,3 

/l,4 

/o,5 


— 3.9958 


— 4.28172 + 3.9958 

1 - 1.1 


2.8592 


2.2404 - 2.8592 
. 6 - 1.1 

.95171 - 1.2376 
.3-1.1 

.27802 - .35736 

.1 - 1.1 

.06302 - .07934 
0 - 1.1 


= 1.2376 


= .35736 


= .07934 


= .01484. 


Now we can add one more term to Po,4 to get (one possible representation of) Po,5 : 

P 0 , 5 (x) = -6 + 1.05170x+.5725x(x-.l) + .215x(x-.l)(x-.3) 

+ .06302a:(:r - .l)(a; - .3)(x - .6) + .01484a;(a; - .l)(x - .3) (a: 


•6) (x 


!)• 


12 : Since N n , L n , Po,m and P n are all the same polynomial except possibly the form in which they are written, 
the error term for a Newton polynomial is the same as that for a Lagrange polynomial: 


f{x) - P n {x) 


f {n+1) (U 

(n + 1)! 


{x 


x 0 )(x 


xi) ■ ■ ■ (x 


Xn)- 


In this particular case, we have 


f(x) - P n (x) 


^^ (2 - 1)(2 - 2.2)(2 - 3) 


Since all derivatives are bounded between —2 and 1 over the interval [1,3], I/* 3 '* (£2)! < 2 and, therefore, the 
error has bound 


I f(x) ~P n (x ) | < ^ 



17 : Since 0.75 is one of the nodes (it is £3), N$ and / agree there. That is what it means for N 3 to interpolate the 
data at xo, x±, X 2 , x$. Hence, 

/(• 75) = iV 3 (.75) 

= 1 + 4(.75) + 4(.75)(.75 - .25) + ^(.75)(.75 - .25) (.75 - .5) 

O 

= 6 . 
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18: / is periodic and has infinitely many roots regularly spread across the real axis. The only diagram showing 
roots of this nature is (d) so / matches with (d). g and / differ only by a small amount for large real values 
so we should expect to see infinitely many more or less regularly spaced roots on the positive real axis. The 
only diagram with roots of this nature is (a) so g matches with (a). I is a fifth degree polynomial so has at 
most 5 roots. Diagram (b) shows 8 colors so 8 roots. Therefore, h matches with (b) and l matches with (c). 
To recap, 


/ 


(d) 

9 


(a) 

h 


(b) 

l 


(c). 


Section 4.1 


1: (a) Lx(x) = 


X — Xi 


fix o) 


X — Xq 


Xq — X\ Xi — Xq 

f(x 0 +h)-f(x 0 ) _ f(x 0 +h)-f(x 0 ) 

XQ-\-h— X q h 


f(xi) (b) L[(x) = 


fix o) , fix 1 ) /(ad) - /(z 0 ) 


f x o + - 


Xq — X\ X\— Xq 


fix o + h) - f(x o) 


xi - x 0 


(c) L'(x o + y) — 


4: (a) The Newton form of an interpolating polynomial derives from a table of divided differences whether it is a 
single value or a formula for a general case. The divided differences table for this case is 


x 0 

x 0 + h 
Xq T 2 h 


/(To) 
f(x 0 + h) 
f{x o + 2 h) 


f(xQ+h)-f(x 0 ) f(x 0 +2h)-2f(x 0 + h) + f(xo) 

1 i 2 h 2 

f (x 0 +2h)~ f (x 0 +h) 

h 


fix 0 + h)- fjx 0 ) = fjx 0 + h)~ /( Xq) 

(xq + h) - x o h 

fix o + 2 h) - fjx 0 + h) = fjx o + 2h) - fjx 0 + h) 
ix 0 + 2 h) — (xq + h) h 

I x f(x 0 +2h)-f(x 0 +h) f(x 0 +h)-f(x 0 ) 

JlA - Ml = h h 

(xo + 2 h) — Xq 2/l 

fjx 0 + 2 h) - 2 fjx 0 + h) + fjx o) 

2 h 2 


Therefore, one possibility for the Newton form is 

AT , ^ t/ \ , f{xo + h) - fix 0 ) fix 0 + 2h)-2fix 0 + h) + fix 0 ), u ^ 

N 2 ix) = fix 0 ) H {x - x 0 ) H (x - x 0 )ix - (x 0 + h)). 

Making the substitution Xq + 9h for x, 

N 2 ix 0 + Oh) = fix 0 ) + [fix 0 + h)~ fix 0 )) 8 


fix o + 2 h) - 2 fix 0 + h) + fix 0 ) 


( b ) % = h and i> N *(xi8)) = i^N 2 ix) ■ % so ^N 2 ix) = 4 N 2 ixi8 )) -5- ^ ^ 


ipN 2 ixi8))' 


g6t d^ N ^ x) = 


h 2 


dd 


d9 


mo- 1). 

. Similarly, we 


d 

dx 

dx 2 


N 2 ix) 

N 2 ix) 


[fix o + h)~ fix „)] + n*o+ 2 h)- 2 f(x 0 +h)+f(xo) ( 2 g _ 

h 

f(,x 0 +2h)-2f(x 0 +h)+f(x 0 ) ^ 

h ■ h 

fix o + 2 h) — 2/(xo + h) + fix 0 ) 
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(c) N!ftx 0 + \h) = S^ 0 + 2 h)- 2 S (x 0 +h)+f{ X0 ) 


f" 



f{x o + 2h) — 2 /{xq + h) + /( Xo) 

h? 


6c: To use this formula, we need xq — h = 10 and xq + Oh = 17, a system of two equations with two unknowns 
whose solution is Xq = 11 and h = 1. Plugging these values into formula 4.1.6: 



1 

x — 5 


dx 


— [5257/(17) - 5880/(16) + 59829/(15) 

-81536/(14) + 102459/(13) - 50568/(12) + 30919/(11)] 
8640 L 5257- 1- 5880- 1 + 59829-1 

-81536 • 1 + 102459 • 1 - 50568 ■ - + 30919 • 

9 8 7 

0.8753962951271979. 


7c: (i) + 15 dx = In |rc — 5||Jg = ln(12) — ln(5) = In « 0.8754687373539001 (ii) The absolute error is the 

absolute value of the difference between the approximation and the exact value: | In // — 0.8753962951271979 1 ss 
7.24(10)- 5 . 

lid: To approximate some quantity in regard to a non-polynomial function, we simply evaluate the corresponding 
quantity for the interpolating polynomial. That means in this case, /'( 2) « p'(2). But p'(x) = 12a; 3 — 4a; + 1 
so /'( 2) « 12 • 2 3 - 4 • 2 + 1 = 89. 


12e: To approximate some quantity in regard to a non-polynomial function, we simply evaluate the corresponding 
quantity for the interpolating polynomial. That means in this case, g{x)dx ss q(x)dx: 


g{x)dx 


f (—7a; 4 + 3a; 2 — x + 4)d: 

Jo 


7 5 3 1 2 „ 

— ar + ar — -ar + 4a; 

5 2 

-i + 1 -i +4 


■X 

n 1 

0 


13d: To use this formula, we need only to substitute proper values for 9 and the f/. 9 must be 0 since the point of 
evaluation is at Xq (which equals xq + Oh). It does not matter which stencil point gives which 9i , but the 9i 
come from the fact that the nodes are x 0 — h, x 0 + 2 h, and x 0 + 3 h. That gives us —1, 2, and 3 for the 6i . 
Setting 9q = —1, 9\ = 2, and 02 = 3: 


/'(*) 


P&) 

(0 - 2) + (0 - 3) 
h {- 1 - 2)(— 1 - 3) 

(0 - (-1)) + (0 - 3) 

^(2 — ( 1)) (2 — 3) 

(0 -(-!)) + ( 0 - 2 ) 

+ h(3 — (— 1))(3 — 2) 

k f{xo ~ h) + ^ f{Xo+2h) 


f{x 0 - h) 


f(x 0 + 2 h) 


f{x 0 + Oh) 


-1 
4 h 


f(x 0 + Oh) 


-5/(a; 0 -h) + 8f(x 0 + 2h) - 3/(x 0 + Oh) 


12 h 
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15c: The integral over this stencil is from Xg to Xg + 2h so 9 0 = 0 and Q\ = 2. The nodes are Xg + J h and Xg + | h 
so 02 and 63 are | and |. It does not matter which is which. Setting 62 = | and $3 = |, the formula from 
question 14c becomes — | • ^5% ((2 ■ | — 2 — 0)f{xg + | h) — (2 • | — 2 — 0)f(xg + | h )), which simplifies to 


nXQ-\-2h 

/ f(x)dx « — 

• Xq ^ 


f ^0 + ^hj + 2/ ^x 0 + ^ h 


Section 4.2 

Id: We are trying to find the undetermined coefficients a* of formula 4.2.1. We solve system 4.2.2 to do so. The 
stencil of this question has 2 nodes, Xg and xg + h, and point of evaluation xo + | h, so in system 4.2.2 we 
have n = 1, 9g = 0 and 9\ = 1, and 0 = |. Because we are deriving a first derivative formula, we also have 
k = 1. Therefore, the system we need to solve is 


Po (xo + = a 0 po(xo) + aiPo{xo + h ) 

= a 0 pi (x 0 ) + aip 1 {x 0 + h) . 


Pi (®o + 

Now, po{x) = 1 so p' 0 (xg + | h) = 0; and pi(x) = x — Xg so p[(xg + § h) = 1. Substituting this information 
into the system, 


0 — do + Si 

1 = a\h. 


From the second equation, aq = Substituting into the first equation, 0 = ag + p so ag = — jj ■ Our 
approximation, formula 4.2.1, becomes 

~ ~^f(xo) + ^f(x 0 + h) 

/( Xg + h) - fix o) 


f (x 0 + 


That formula should look familiar! 

lj: We are trying to find the undetermined coefficients oq of formula 4.2.1. We solve system 4.2.2 to do so. The 
stencil of this question has 4 nodes, Xg, Xg + h, xg + J h, and Xg + 2 h with point of evaluation Xo + \h, so 
in system 4.2.2 we have n = 3, 8 g = 0, 6 \ = 1, 02 = f , #3 = 2, and 9 = Because we are deriving a first 
derivative formula, we also have k = 1. Therefore, the system we need to solve is 


Po (xo + \ h ^J 


Pi 


Xg + ~h 


P 2 (x 0 
Ps (^o + 


agPoixg) 

agpiixg) 

agP 2 {xg) 

a 0 P 2 {xg) 


aipg{xg 

aiPiixg 

aiP 2 {xg 

aiP 2 {xg 


h) 

h) 

h) 

h) 


CL 2 Po{xg 

CL2Pl{xg 

CL 2 P 2 {xg 

a 2 P 2 {xo 


§*> 


d3Po{xg 

a 3 pi(x 0 

a 3 P2{xg 

d3P2{Xg 


2 h) 
2 h) 
2 h) 
2 h) 


Now, pg(x) = 1 SO p'g(xg + \h) = 0; Pl{x) = X — Xg SO Plixg + \h) = 1; P 2 (x) = (x — Xg) 2 so p' 2 (xg + \h) = h\ 
and P 3 {x) = (x — Xg) 3 so p' 3 (xg + ^ h ) = | h? . Substituting this information into the system, 


— CLg + ai + Cl2 + < 2,3 

3 


h = 


-h 2 = 


aih + 02 • —h + a 3 -2 h 
9 

aih 2 + a ,2 • -h 2 + as • 4 h 
27 

aih 3 H - q .2 ■ ”h 03 • 8 h. 
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The first equation is the only one in which a 0 appears so we concentrate on solving the last three equations, 
which simplify to: 


2 

ft 

4 

ft 

6 

ft 


— 2oi + 3ft2 4- 4 g,3 
= 4ai + 9a,2 + 1603 
= 8ai T 27(^2 T 64 q 3 . 


From the first equation, 2ai = f — 3a2 — 4a 3 so 4ai = p — 602 — 803 and 8ai = | — 12 a 2 — 1603. Substituting 
into the second and third equations, respectively, 


4 

ft 

6 

ft 


= 3 — 6a 2 — 8a 3 + 9a 2 + 16a 3 
ft 

g 

= - — 12a2 — 1603 + 27a2 + 64a3 


which simplifies to 


0 — 3u2 4“ 8(13 

2 

— — = 15d2 4- 48a 3 . 

ft 

From the first equation, (13 = — |d2. Substituting into the last equation, —\ = 15(i2 4- 48(— ^2), which 
simplifies to — | = — 3a2 so 

Back-substituting, 03 = —§02 = — 1(^) so 


a 2 = 


3 ft 
1 


a 3 = - 


Ah' 


Continuing the back-substitution, 2ai = | — 3a2 — 4a 3 = \ — 3(^) — 4(— jr), which simplifies to 2ai = ^ so 


ai = 2h' 


Finally, a 0 = -01 - a 2 - a 3 = ~ 4: + xr so 


ao = ~ 


11 


Our approximation, formula 4.2.1, thus becomes 


f [x 0 + 


m fM + ^ /(w + h} + Jd (l ° + s' 1 ’ ~ i /(l » + 2h) 

— ll/(a:o) 4- 6/(xo + h) + 8/(to + § h) — 3/(to 4- 2 h) 

m ‘ 


2f: We are trying to find the undetermined coefficients of formula 4.2.1. We solve system 4.2.2 to do so. The 
stencil of this question has 4 nodes, xq, xq 4- ft, to 4- § h, and .To 4- 2 h with point of evaluation To 4- |ft, so 
in system 4.2.2 we have n = 3, 6 0 = 0, 9\ = 1, 0 2 = §> 63 = 2, and 9 = Because we are deriving a first 
derivative formula, we also have k = 1. Therefore, the system we need to solve is 


Po [x Q + ^h 


Pi ( x 0 + 2 h 


p 2 [ To + 2^ 


p 3 ( T 0 + -ft 


= aoPo(xo) + a 1 po(x 0 + ft) 4- (i 2 Po(^o 4- -ft) 4- a 3 p 0 (x 0 + 2 ft) 

3 

= a 0 pi(T 0 ) 4- Oipi(x 0 4- ft) 4- a 2 pi(xo 4- -ft) 4- a 3 pi(x 0 4- 2 ft) 

3 

= a 0 p 2 (xo) 4- aip 2 (zo 4- ft) 4- a 2 p 2 (x 0 + -ft) 4- a 3 p 2 (x 0 + 2 ft) 

3 

= floP 2 (®o) + a iP 2 (xo 4- ft) 4- a 2 p 2 {x 3 + -ft) 4- a 3 p 2 (xo + 2 ft) 
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Now, p 0 (x) = 1 so Pq (xq + \h) = 0; p\{x) = x - x 0 so p"(x 0 + \h) = 0; P 2 (x) = ( x-x 0 ) 2 so p'{{x 0 + \h) = 2; 
and p 3 (x) = (x — Xo ) 3 so pg(x o + \h) = 3 h. Substituting this information into the system, 

0 = do + CL\ + 0,2 + o 3 

3 

0 = a±h + <12 ■ -h + a 3 • 2h 

9 

2 = a\h 2 T 0,2 ' — f?. 3 T ’ 4/i 

27 

3 h = aih 3 + a,2 ■ — h 3 + a 3 ■ 8h. 

8 

The first equation is the only one in which do appears so we concentrate on solving the last three equations, 
which simplify to: 


0 — 2 cli T 3 ct 2 T 4 d 3 


—7z = 4ai + 9a 2 + 16a 3 
h z 

24 

■pr = 8ai + 27a 2 + 64a 3 . 

h. 


From the first equation, 2ai = — 3a 2 — 4a 3 so 4a 3 = — 6 a 2 — 8 a 3 and 8 ai 
the second and third equations, respectively, 


— 12a 2 — 16a 3 . Substituting into 


h 2 
24 

P 


which simplifies to 


= — 6 a 2 — 8 a 3 + 9a 2 + 16a 3 
= — 12a 2 — 16a 3 + 27 a 2 + 64a 3 


jp ~ 3a 2 + 8a 3 
24 

tit = 15o 2 + 48a 3 . 
h 2 


Five times the first equation minus the second equation yields A = — 8 a 3 so 


° 3 = 


Back-substituting, jp = 3a 2 + 8 a 3 = 3a 2 + 8 (— p-) so 


a 2 = 


h 2 ' 


Continuing the back-substitution, 2ai = — 3a 2 — 4a 3 = —3(A) — 4(— A), which simplifies to 2a 3 = — A so 


ai = 


Finally, a 0 = -ai - a 2 - a 3 = A - A + A so 


a 0 = 


h 2 ' 

2 

h A 


Our approximation, formula 4.2.1, thus becomes 


f'(x 0 + ^hj « ^f(xo)-^f(xo + h) + ^f(xo + ^h)-^f(xo + 2h) 

_ 2/(x 0 ) - 8 /(x 0 + h) + 8/(x 0 + | h) - 2 /(x 0 + 2h) 

h 2 ‘ 
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4b: We are trying to find the undetermined coefficients a* of formula 4.2.3. We solve system 4.2.4 to do so. The 
stencil of this question has 1 node, Xo + | h and endpoints of integration Xq and xq + 2 h, so in system 4.2.4 
we have n = 0, a = xq and b = Xq + 2 h. Therefore, the “system” we need to solve is 

f»#o+2 h 

p 0 (x)dx = a 0 p 0 (x 0 ). 


’ XQ 

Now, p 0 (x) = 1 so f*° +2h po(x)dx = f*" +2h dx = 2 h. Substituting this information into the system, 


2 h = ao- 


Our approximation, formula 4.2.3, becomes 

rxo+2 h / o \ 

j f{x)dx « 2hf + -hj . 

41: We are trying to find the undetermined coefficients di of formula 4.2.3. We solve system 4.2.4 to do so. The 
stencil of this question has 3 nodes, Xq, Xq + h, and Xq + 2 h with endpoints of integration xq and Xo + 2 h, so 
in system 4.2.4 we have n = 2, a = xq and b = x o + 2 h. Therefore, the system we need to solve is 

nXQ-\-2h 

/ p 0 {x)dx = a 0 po{x 0 ) + a-iPoixo + h) + a 2 po{xo + 2h) 

J X 0 

f'Xo~\~2h 

pi(x)dx = a 0 pi(x 0 ) + aipi(x 0 + h.) + a, 2 Pi{x 0 + 2h) 

' Xq 

pXq -\-2h 

/ p 2 (x)dx = a 0 p 2 (x 0 ) + aip 2 (x 0 + h) + a 2 p 2 (x 0 + 2h) 

J X 0 

Now, p 0 (x) = 1 so f*° +2h p 0 (x)dx = f*° +2h dx = 2 h\ pi(x) = x-x 0 so f*° +2h pi(x)dx = f*° +2h (x-x 0 )dx = 
\{x - x 0 ) 2 \ X ° +2h = 2 h 2 ; and p 2 (x) = (x-xo) 2 so f*° +2h p 2 (x)dx = f*° +2h (x-x 0 ) 2 dx = \(x - x 0 ) 3 \l° +2h = 


f 

J Xn 


|/i 3 . Substituting this information into the system, 


2 h — do “I - Q>i H - ^2 
2h 2 — ci\h CL2 (2/i) 

^ h 3 = aih 2 + a 2 (4/i 2 ). 

o 

The first equation is the only one in which ao appears so we concentrate on the last two equations, which 
simplify to: 

2 h = d\ T 2a2 

8 L 

-h = ai + 4a 2 . 

From the first equation, a\ = 2h — 2a 2 . Substituting into the second equation, = 2h — 2a 2 + 4a 2 , which 


simplifies to = 2a 2 , so 
Back-substituting, d\ = 2h — 2a 2 = 2 h— 2 (|/i) so 

Finally, a® = 2h — a\ — d 2 = 2h ~ — bh so 


02 = 3 h - 


4 i 

01 = 3 k ■ 


d 0 = -h. 


Our approximation, formula 4.2.3, thus becomes 


f'Xo+2h 


f(x)dx « ^hf(xo) + pif(x o + h) + ^f(x 0 + 2 h) 
= ^[f(x 0 )+4f(x 0 + h) + f(x 0 + 2h)}. 


You may recognize this formula as Simpson’s rule! 
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Section 4.3 


oXq-\- 2 h 


3a: Simpson’s rule for integral approximation is 

cO 


f(x)dx = - [/(x q) + 4 /(x 0 + h) + /(x o + 2 h)\. To apply 


f 

it to the integral / xln(x + l)dx we need to identify /, xo, and h. In the formula, Xo is the lower limit of 
J - 0.5 


integration, so we have x’o = —0.5 in this question. In the formula, the length of the interval of integration 
is 2 h, so we have 2 h = 0.5 in this question, or h = 0.25. In the formula, / is the integrand, so we have 
/(x) = xln(x + 1). With the parameters identified, we plug them into the right side of Simpson’s rule and we 
have our estimate: 


[0 

/ xln(x + l)dx 
J- 0.5 


— [-0.5 ln(0.5) + 4(— 0.25) ln(.75) + 01n(l)] 

O 

0.05285463856097945. 


fX 0 +h 

4a: Trapezoidal rule for integral approximation is / f(x)dx = — [f(x o) + /(x o + h)}. To apply it to the inte- 

Jx 0 2 

gral / xln(x + 1 )dx we need to identify /, xo, and h. In the formula, Xo is the lower limit of integration, 
J- 0.5 

so we have xq = —0.5 in this question. In the formula, the length of the interval of integration is h, so we 
have h = 0.5 in this question. In the formula, / is the integrand, so we have /(x) = xln(x + 1). With the 
parameters identified, we plug them into the right side of the trapezoidal rule and we have our estimate: 


[0 

/ xln(x + l)dx 
J- 0.5 


j [—0.5 ln(0.5) + 0 ln(l)] 
0.08664339756999316. 


rx 0 +2h 

: The midpoint rule for integral approximation is / f(x)dx = 2/i/(xo + h). To apply it to the integral 

j x 0 

r° 

/ xln(x+l)dx, 

J- 0.5 


we need to identify /, Xo, and h. In the formula, xo is the lower limit of integration, so we have Xq = —0.5 
in this question. In the formula, the length of the interval of integration is 2 h, so we have 2 h = 0.5 in this 
question, or h = 0.25. In the formula, / is the integrand, so we have /(x) = xln(x + 1). With the parameters 
identified, we plug them into the right side of the trapezoidal rule and we have our estimate: 


r° 

/ xln(x + 1 )dx 
J- 0.5 


2(.25)(— 0.25 ln(0.75)) 
0.03596025905647261. 


6a: Using integration by parts, 

r o 


xln(x + l)dx = 


' —0.5 


— ln(x + 1) 


0 


cO 


X 


(—0.5)' 


- 0.5 2 J- 0.5 x + 1 

0 


-dx 




dx 


= — -1251n(.5) — - 

= —.125 ln(.5) + ^ 


n 0 


— x + In |x + 1| 

■25 , , / 

— + .5 + ln(.5) 


-0.5 


= .3125 + ,3751n(.5) 

« 0.05256980729002053 


so the error is |0.05285463856097945 - 0.05256980729002053| « 2.8483(10)- 4 . 
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7a: See above for the exact evaluation of the integral. The error follows as 
10.08664339756999316 - 0.05256980729002053| « 0.034073. 

8a: See above for the exact evaluation of the integral. The error follows as 
10.03596025905647261 - 0.05256980729002053| « 0.016609. 

11a: —jr-f'"(£h) is the error term for this approximation formula. The remainder of the equation is the approxi- 
mation. We simply plug the given information into the approximation formula: 

t n n _ /O o + h) - f{x o - h) 

fM » Yh 

e 21 -e 19 

2(-l) 

« 7.401377351441916. 


12a: The error term, — \f"{£,h), dictates the error. As in Taylor’s Theorem, this error term is exact for some 
value of Finding a bound on the error means minimizing or maximizing |yf/ ,,, (Ch)l over all possible values 
of The possible values of £/, are all values between the least node and the greatest node, a fact that follows 
from Taylor’s Theorem. For this question, h = .1 and = e^, so a lower bound for the error is 


■ ± • f 

— mm 

6 £G[1.9,2.1] 


and an upper bound is 

■l 2 i 

— max . 

6 £e[1.9,2.1] 

But is an increasing function, so its minimum value over [1.9, 2.1] occurs at 1.9 and its maximum at 2.1. 
Hence, we have the error between ^e 1 ' 9 and ^e 21 , or as floating point approximations, 0.01114315740379878 
and 0.01361028318761275. f'{x ) = e x so /'( 2) = e 2 exactly. The actual error is thus |e 2 — 7.401377351441916| ~ 
0.01232125251126526, which is between the bounds. 


13a: The full details of the formula include the implied qualification “for some ^ £ ( xq — h,x o + h)”, the interval 
being decided by the least and greatest nodes. So we search for a value of £/, so that 


^ f( x o + h)~ f{x o - h) h 2 ,,, 

/m = a* (a) 


and £ (xo — h, Xq + h) . /, Xq, and h are given, so we substitute them into this equation and solve. But 
first, note /'(x) = e x and f"'ix) = e x : 



.01 e 21 — e 19 — .2e 2 

T eh = .2 

^ = ln(3000(e 2 ' 1 - e 19 - .2e 2 )) 

w 2.00049999404725, 


and £ (1.9, 2.1) as required. 

15: The degree of precision is 4 since the error term involves the fifth derivative of /. The fifth derivative of any 
polynomial of degree 4 or less is identically zero, so if / is any polynomial of degree 4 or less, the error in 
using the approximation formula is zero. 


17c: The error in any approximation formula is the difference between the two sides. One side holds the exact 
quantity and the other holds the approximation. To find the error, we subtract the two sides from one another, 
expand each appearance of / in a Taylor series about Xq and simplify. The term of least degree remaining 
determines the error term. 
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The left side of this approximation is f x ° + ' f(x)dx , so replace f(x) by f(xo) + {x — Xo)f'(xo) + ^(x — 
x 0 y 2 f"{x 0 ) + ±(x - x 0 ) 3 f'"(x 0 ) H : 


rx-o+h /■ 

/ f(x)dx = / 

J Xq j X 


f{x o) + {x- x 0 )f'(x o) + x - x 0 ) 2 f"(x o) + ^(x - x 0 ) 3 f'"(x 0 ) + 


xo+h 

Xq 

xf{x o) + ^(x - x 0 ) 2 f(x 0 ) + ^(x - x 0 ) 3 f"(x 0 ) + ^(x - x 0 ) 4 f'"(x 0 ) + ■ ■ 


dx 

-i xo+h 


= hf(x o) + \h 2 f{ x 0 ) + ^h 3 f”(x o) + + • • • . 

The right side of the approximation includes f(x o+ |/i), so this expression is also expanded in a Taylor series: 

/ (jco + ^hj = f(x o) + ^hf{x 0 ) + ^h 2 f'{x o) + ^/i 3 /'"(>o) + • • • . 

Substitute these expansions into the difference of the two sides and simplify. The error is 

hf(x 0 ) + ldi 2 f\x o) + ^h 3 f"{x o) + ^j-* 4 /"'(a:o) + • 


3 ( /Oo) + \ h fi xo) + ^h 2 f"{x 0 ) + ^-h 3 /'"(a:o) + • • • ) + f(x 0 ) 


hf{x 0 ) + ^/i 2 /'(a:o) + ^ 3 /"(a:o) + ^j-* 4 /'"(a:o) + • • ■ 
- (hf{x 0 ) + ^ 2 /'(a:o) + jU 3 /"Oo) + ^ h 4 f'"(x 0 ) + • • 


216^0) + - • 

Work done heretofore is informal evidence that the error term is 0(h 4 f"'(^h))- To formalize, we truncate the 
Taylor series, making them Taylor polynomials of convenient degree, with error terms! The error terms from 
the Taylor polynomials become the error term for the approximation formula. Beginning with the left side of 
the formula, the exact value: 


/•Xo + h rxo+h 

/ f(x)dx = / 

j Xc\ j X n 


f(x o) + (x - xo)f'(xo) + ^(x - x 0 ) 2 f"(x 0 ) + ^(x - x 0 ) 3 f'"({ x ) 

l xo + h 


dx 


xf{x o) + ^(x - X 0 ) 2 f'{x o) + ^{x - x 0 ) 3 f"(x 0 ) 

rxo+h i 

+ / -(x-x 0 ) 3 f"'(£ x )dx 

Jxo 6 

1 1 p x Q -\-h -j 

= hf(xo) + -h 2 f'(x 0 ) + -h 3 f"(x 0 ) + ~(x - x 0 ) 3 f" (£ x )da 

2 6 Jxo 6 


for some unknown function of x. Now, the f(x o + | h) term from the right side of the formula, the 
approximate value: 

/ + ^hj = f(x 0 ) + ^hf(x 0 ) + ^h 2 f"{x 0 ) + ^j-/i 3 /"'(£i) 

for some £ ( Xq,xq + h). Subtracting the two sides, we know all terms with derivative lower than the third 
will drop out since none of those terms have changed since our discovery. The error is, therefore, 

rxo+h i u 4 

/ ~(x - xo ) 3 f"'(&)dx - T • 3 • -rh 3 r\i I). 


6 ' 


81 
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The Weighted Mean Value Theorem allows us to replace J^° +h |( x — xo ) 3 f"(^ x )dx by |/ ,,, (c) f*° +h (x — 
x 0 ) 3 dx = ±h 4 f"'(c) for some c £ (x 0 ,x 0 + h). The error term thus becomes 

^ 4 /'"(c) - ^ 4 r( 6 ) 

for some c £ (xq,Xo + h) and some £/, £ (xo,Xo + h). The final formality is to replace this term with big-0 
notation: 


^ 4 /» - ^ 4 r(e i) 


< ^ 4 (^i/ ,,, (c)i + ^i/ ,,, (a)i) 

< h 4 ^ ^ max{|/'"(c)| , |/"'(£i)|} 

= Mh 4 \f"'{^h)\ 


for some ^ £ (xq,xo + h) and M = 4j + ^ = 7 W (th e v& l ue of E,h is either c or £ 1 ). Hence, the error is 

o(h 4 f"'(t; h )). 


18c: The error in any approximation formula is the difference between the two sides. One side holds the exact 
quantity and the other holds the approximation. To find the error, we subtract the two sides from one another, 
expand each appearance of / in a Taylor series about Xq and simplify. The term of least degree remaining 


determines the error term, f'(xo) 


-3f(x 0 ) + 4 / (xo + % ) - f(x o + h) 


The left side of this approximation is f'(x o), so its Taylor expansion is itself! The right side of the approxi- 
mation includes f(x o + \h ) and f(x o + h), so these expressions are expanded in Taylor series: 


f^xo + ^hj = f{xo) + ^hf'(xo) + ^h 2 f"(xo)+ ^h 3 f"'{x 0 )-\ 

/ (z 0 + h ) = f(x q) + hf(x 0 ) + x 0 ) + ^h 3 f"(x 0 ) H . 


To simplify the display of the algebra, we begin by summing —3f(xo) + 4/(a;o + |) — f(xo + h ): 


-3 f(x 0 ) 
4 f(x 0 + | h) 
-f(xo + h) 

—3f(xo) + 4:f(xo + |) — f(xo + h ) 


-3/(a:o) 

4/(®o) + 2 hf(x 0 ) + x 0 ) + j^h 3 f"{x 0 ) H 

-/(ftp) ~ hf'(x q) - ±h 2 f"(x 0 ) - \h 3 f"(x 0 ) H 

- ^h 3 f"(x 0 ) H . 


The difference of the two sides is then 

/'(Xo) - '■/'<»») -VV'"(x o) + - = 

Work done heretofore is informal evidence that the error term is 0(h 2 /'"(&))■ To formalize, we truncate the 
Taylor series, making them Taylor polynomials of convenient degree, with error terms! The error terms from 
the Taylor polynomials become the error term for the approximation formula. The left side, again, is a Taylor 
expansion! Now, the f(x o + \h) and f(x o + h) terms from the right side of the formula: 

f{x 0 + h ) = f(x 0 ) + hf{x 0 ) + ^h 2 f"(x 0 ) + /'"(&) 

for some £i ,£2 £ (a:o,a:o + h). Subtracting the two sides, we know all terms with derivative lower than the 
third will drop out since none of those terms have changed since our discovery. The remaining terms, those 
with the third derivative in them, is the error and is 


-4 ■u 3 f"\i 1 ) + \h 3 r^ 2 ) 


= h 2 


\r^2) - ^/ w, (fi) 


h 
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for some £i,£ 2 £ (xo>Xo + h). The final formality is to replace this term with big-0 notation: 


2) - ^/"'(£ 1) 




6 

1 1 


12 


< h - ^^- + _j m ax{|/" , (6)U/ ,,, (a)|} 

= Mh 2 \f"'(Zh)\ 

for some ^ £ {xo,Xq + h ) and M = | | (the value of £/, is either £ 2 or £1). Hence, the error is 

o(^ 2 /"'(a))- 

19: Diffy Renee is using a second derivative formula with xq = 3 since the left side is /"(3.0). On the right 
side, we see a term with sin(3) in it. This is likely sin(xo) from one of the second derivative formulas. 
We also see sin(2.8) and sin(3.2) which look likely to play the roles of sin(a:o — h) and sin(xo + h) in the 
approximation formula used. Looking at table 4.3 for a formula with /( xq — h), f(x 0 ), and f(x 0 + h) in 
it, we find f"(x 0) = ■f(aro-fr)-2/(a;o)+/(a:o+/i.) _|_ 0(/i 2 /(4)(^)). Continuing with the hypothesis that we have 
f(x) = sin(a:), Xq = 3, and h = .2, we plug into the formula to find 


/"( 3 ) 


sin(2.8) — 2sin(3) +sin(3.2) 

= 25 [sin(2.8) — 2sin(3) + sin(3.2)] . 


We conclude that /( x) = sin a;. 


23c: First, we need to identify the formula being used. Since this is a third derivative formula with Xq = 3 and 
evaluations of / at 3,3.01,3.02,3.03,3.04, this is a five-point formula with h = .01. The formula used is this 
one from table 4.4: 

/'"(^o) = + 18 /( x 0 + h ) ~ 24 /Q 0 + 2h ) + u f ( x 0 + 3/t ) ~ 3 /(:e 0 + 4fe) + 0(h 2 f {5 \£ h )) 

so the error term is 0(h 2 /^(6i))- The error is, therefore, bounded by 

/c(.01) 2 max f^Mx) 

x£[3,3.04] 

for some constant k dependent on the method , not the function / or the nodes used. Now, 

f^ 5 \x) = max |cos(a;)| 


max 
are [3,3.04] 


are[3,3.04] 

= |cos(3.04)|. 


A bound on the error is, therefore, 0.0001A: cos(3.04) or 9.9485(10) 5 k for some k dependent on the method. 

23f: First, we need to identify the formula being used. The unusual points of evaluation in the approximation 
identify it quickly as 


L 


XQ+h 


Xq — h 


f(x)dx = h 


f[x 0 - -^=h )+f[x 0 + 


V3 


o(^ 5 / (4) fe)) 


with Xq = 3.5, h = 0.5, and error term 0(h 5 /' 4 '(£fr))- The error is, therefore, bounded by 

fc(.5) 5 max f^Hx) 

xG [3,4] 

for some constant k dependent on the method , not the function / or the nodes used. Now, 


max 
are [3,4] 


f^\x) = max Isin(aj) I 

are [3,4] 

= |sin(4) | . 


A bound on the error is, therefore, 0.03125A:sin(4) or 0.023651fc for some k dependent on the method. 
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24: (a) We are given only 5 nodes, so we must use them all for each approximation. The nodes are (thankfully) 
evenly spaced so we can use one of the formulas in table 4.2. There are two nodes to the left of 2 and two 
to the right, so we need to use the five-point formula with nodes Xq — 2 h, Xq — h, xq, Xq + h, and xq + 2h 
to approximate /'( 2). All four of the nodes other than 4 are to the left of 4 so we need to use the five-point 
formula with nodes Xq — 4 h, xq — 3 h, xq — 2 h, Xq — h, and xq to approximate /'(4). Hence, 


-.2381 - 8(— .3125) + 8(-.8333) - (-5) 

m) 

= 0.049625 

3(— .2381) - 16(— .3125) + 36(-.4545) - 48(-.8333) + 25(-5) 

12 ( 1 ) 

= -8.089825. 


(b) We should expect the approximation of f'(2) to be better because the error term for the formula used is 

to / 5 (£fc) where the error term for the formula used in approximating /'( 4) is six times greater. 

Another reason we should expect the /'( 2) approximation to be better is because 2 is centrally located amongst 
the nodes where 4 is as far from centrally located as possible! 

(c) /'( x) = — 2 y so /'( 2) = and /'( 4) = —25. The absolute errors are 


|/'(2) - 0.049625| 
|/'(4) - (-8.089825)| 


and the relative errors are 


0.2562365702479338 

r 

(2) 

16.910175 


m 


0.2562365702479338 

16.910175 


1.240185 

0.6764070000000001. 


So, as expected the absolute error in the approximation of /'( 2) is smaller than that of /'( 4), but the relative 
errors, which are perhaps more important, are exactly the opposite in comparison! 


33: The function shown below (A = 2.584739179873929) is one example. 



The area of trapezoid CDEF represents the approximation by the trapezoidal rule (which is where it gets 
its name). The function f(x) was chosen so that the two brownish areas are (very nearly) equal, one above 
line segment CD and one below. This means the trapezoidal rule approximation will be (very nearly) exact. 
Moreover, since the point A is not on line segment CD, the approximation by Simpson’s rule will not be (very 
nearly) exact. Other examples can be created similarly. To summarize, any example of a smooth function 
where the following occur will work. 

• The areas above and below the line segment from (0, /( 0)) to (1, /( 1)) are equal. 

• (.5, /(. 5)) does not lie on the line segment from (0, /( 0)) to (1, /(!)). 
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REMARK: Non-smooth functions with the two properties above also provide examples. The reason we chose 
to give a smooth example is because the errors for non-smooth functions are completely unpredictable 
(since they don’t possess the required number of derivatives), and, hence, it is not as surprising in that 
case that we can find examples where the trapezoidal rule outdoes Simpson’s rule. The trapezoidal rule 
and Simpson’s rule can not be applied reliably to functions without sufficient derivatives. 

REMARK: The question did not request a formula, so any hand-sketched graph with the two properties 
above would suffice. Since we have a formula, however, we can demonstrate numerically the result. For 
the function / pictured above, 


[ f(x)dx « 3.443097449311693 

J o 

Trapezoidal Rule = ^ « 3.443097449311694 

Simpson’s Rule = ^ ^ « 3.632535470843161. 

6 

34 : Five-point formulas for the 2 nd derivative have error term 0{h 3 (£/j)) or o(* 4 / (6) (a))soE. 1 = A ; (.i) 3 /( 5 )(ei) 

or E. i = fc(.l) 4 /( 6 )(£.i) and E, 02 = k(.02) 3 / (5) (£. 02 ) or E, m = k(.02) 4 Assuming / (5) (£.i) « 

/^(£- 02 ) if the error term is 0(h 3 (£h)) or that ~ 02 ) if the error term is 0(/i 4 /( 6 ) (£/,)), we 

should expect 

e a = fc(-i) 3 / (5) (ei) „ ( -i V = 12 . 

E. 02 fc(.02) 3 /(s)(e O2 ) ~ \.02j 

or 

ea _ = fc(.i) 4 / (6) (ei) „ mV = fi2 . 

E. 02 fc(-02) 4 /( 6 )(^. O 2) \-02/ 

Section 4.4 

la: Divide the interval of integration, [1, 3] into 3 subintervals of equal length and apply the midpoint rule to each 
of the subintervals. The sum of the three estimates is the answer. 

interval midpoint rule 

[1,1 + §] |ln(sin(l + 4 )) « -0.0189755760325961 
[1 + §, 2 + §] § ln(sin(2)) « -0.06338869073010707 

[2+ |, 3] | ln(sin(2 + §)) « -0.5216503391783174 


ln(sin (x))dx « -0.6040146059410205 

2a: Divide the interval of integration, [1,3] into 3 subintervals of equal length and apply the trapezoidal rule to 
each of the subintervals. The sum of the three estimates is the answer. 

interval trapezoidal rule 

[1,1 + f] 1 (ln(sin(l)) + In (sin (1 + §))) « -0.05906878811071457 

[1 + f , 2 + |] 1 (in (sin (1. + §)) + ln(sin(2 + 1))) « -0.1096099655624244 

[2 +|,3] 1 (ln(sin(2 + 1)) + ln(sin(3))) « -0.7607906360781023 



ln(sin(:r))<£r « -0.9294693897512412 


3a: Divide the interval of integration, [1, 3] into 3 subintervals of equal length and apply Simpson’s rule to each of 
the subintervals. The sum of the three estimates is the answer. Let f(x) = ln(sin(:r)). 
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interval Simpson’s rule 

[1,1 + f] | (/(l) + 4/(1 + 1) + /(I + §)) « -0.03233998005863559 

[1 + §, 2 + i] (/( 1 + §) + 4/(2) + /( 2 + i)) « —0.0787957823408795 
[2+ §,3] | (/(2 + |) + 4/(2+ |) + /(3)) « -0.6013637714782457 


ln(sin(a;))c£r « -0.7124995338777608 


4a: Divide the interval of integration, [1, 3] into 3 subintervals of equal length and apply Simpson’s | rule to each 
of the subintervals. The sum of the three estimates is the answer. Let f(x) = ln(sin(:r)). 


interval Simpson’s | rule 

[1,1+!] 12 (/(!) + 3/(1- + |) + 3/(1+ f ) + /( 1 + §)) « -0.03227403251196553 

[1 + §, 2 + 1] + (/(l + |) + 3/(1 + §) + 3/(2 +§) + /( 2 + 1)) « -0.07868946204953159 

[2+|,3] ^ (/( 2 + 1) + 3/(2 + |) + 3/(2 + |) + /( 3)) » -0.5965852934114506 


ln(sin(x))da; « -0.7075487879729477 


5a: Divide the interval of integration, [1, 3] into 3 subintervals of equal length and apply the quadrature rule to 
each of the subintervals. The sum of the three estimates is the answer. Let f(x) = ln(sin(a;)). 


interval quadrature rule 

[1,1 + |] 1 (/( 1 + |) + /(I + §)) « -0.02334244731238252 

[1 + |,2+1] 1 (/(l + §) + /(2+ 1)) w -0.068382627545234 

[2+|,3] 1 (/( 2 + §) + /( 2 + |)) « -0.5418501791892335 


ln(sin(;r))d:r ~ —0.63357525404685 


7 : The trapezoidal rule applied to fj sin 4 x dx gives 

|(sin 4 (0) + sin 4 (7r)) =0, 

which has absolute error g7r. Since the trapezoidal rule has error term O(^), dividing the interval of 
integration into n subintervals should decrease the error by a factor of about Therefore, we need to solve 

I 71 ’ -4 

the equation -2— = 10 : 

n 1 



IQ " 4 



n 


10" 4 


n 


2 


n 

108.5. 


Increasing the number of intervals by a factor of 109 should do the trick. Since our initial estimate used but 
one interval, we need to use 109 intervals to achieve 10 -4 accuracy. 
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15 : Let 5*, (a, 6) mean applying composite Simpson’s rule to the interval [a,b] with k subintervals and mean the 
error in 5*, (a, b). We now repeat the analysis we did in deriving the adaptive trapezoidal rule but applied to 
Simpson’s rule: 


^71 



and e 2n 


M 



4 


so 


^2 n 




16, which implies e n ~ 16e 2n . 


Because f(x)dx = 52(0,6) + e2 = Si(a,b) + ei, 


5 2 (a, 6) - Si(a,b) 


ei - e 2 
16e 2 - e 2 
15e 2 


so e 2 ~ Yg(5 2 (a, 6) — 5i(a, 6)). Explicitly, 


J f(x)dx — S 2 (a,b) « ^r(5 2 (a, 6) - 5i(a,6)). 

Now we know what quantity to use in order to estimate the error. We tabulate the necessary computations: 


a 

6 

Si(a,b) 

5 2 (a, 6) 

fy 5 2 (a, b) — S\(a, b)\ 

tol 


1 

3 

-0.837026 

-0.730741 

0.00708 

.002 

X 

1 

2 

-0.046286 

-0.045560 

4.8(10)- 5 

.001 

V 

2 

3 

-0.684454 

-0.661383 

0.00153 

.001 

X 

2 

2.5 

-0.134349 

-0.134243 

7.0(10) -6 

.0005 

V 

2.5 

3 

-0.527034 

-0.523129 

0.00026 

.0005 

-✓ 


ln(sin(a:))(ia; « -0.045560 - 0.134243 - 0.523129 = -0.702932 


23 : First, 


ln(a; + l)dx 


[(x + 1) In (a: + 1) — x — 1]J 

21n2 — 2 — (—1) 

2 In 2 — 1 

0.3862943611198906. 


Now we need to get an estimate using the composite trapezoidal rule with a small number of intervals, say 10 
or 20. This part of the computation is mere speculation. Really, any number of intervals that will not give 
the desired accuracy will suffice: 

Ti O (0, 1) = 0.385877936745754. 

The error with 10 subintervals is 


10.3862943611198906 - 0.385877936745754| « 4.16424374136581(10)" 4 . 


Since the error term for the composite trapezoidal rule (assuming f"(£h) is constant, as we do in deriving the 
adaptive method) is O (t), we expect the error to decrease by a factor of n 2 as the number of intervals is 
increased by a factor of n. The needed factor of decrease is 


10" 6 

4. 16424374136581 (10) -4 


0.00240139641699267. 


Therefore, the necessary factor of increase is 


0.00240139641699267 

intervals, so we need to use 10 • 20.406 = 204.06, or rounding up, 205 intervals to achieve 10~ 6 


20.406. Our “test” calculation used 10 

accuracy. 
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REMARK: Another way to find the necessary factor of increase is to solve the equation 

4. 16424374136581 (10) “ 4 _ iq _ 6 
n 2 

This comes from the fact that increasing the number of intervals by a factor of n decreases the error by 
a factor of n 2 . Thus we take the known error (of 7i 0 (0, 1)), divide by n 2 and set it equal to the desired 
accuracy, 10~ 6 . The solution, of course, is n ~ 20.406, the factor of increase. 

REMARK: We have used the Octave code 

#################################################### 

# Written by Dr. Len Brin 2 April 2012 # 

# MAT 322 Numerical Analysis I # 

# Purpose: Implementation of composite Trapezoidal # 

# rule # 

# INPUT: function f, interval endpoints a and b, # 

# number of subintervals n # 

# OUTPUT: approximate integral of f(x) from a to b # 
#################################################### 
function integral = compositeTrapezoidal (f ,a,b,n) 

h = (b-a)/n; 
s = 0; 

for i = l:n-l 

s = s + f (a+i*h) ; 
end#for 

integral = h*(f (a)+2*s+f (b))/2; 
end#function 

to calculate T lo (0,l): 

>> f =inline ( ’ log(x+l) ’ ) ; 

>> compositeTrapezoidal (f ,0,1, 10) 
ans = 0.385877936745754 

compositeTrapezoidal .m may be downloaded at the companion website. 

REMARK: Using the code above to calculate the approximation with 205 subintervals: 

>> compositeTrapezoidal (f ,0,1 ,205) 
ans = 0.386293369647938 

and it has error 

» 0 . 38629436 1 1 198906-ans 
ans = 9 . 91471952871414e-07 

just less than 10 -6 . 


Section 4.5 

7 : We need to combine N(h ), IV(^), and N(^) so that terms involving h and h 2 vanish, leaving h 3 as the lowest 
order term. 


N{h) = M — Kih — K 2 h 2 — K 3 h 3 

N ( h \ = M - 1 K 1 h - 1 I< 2 h 2 - -K 3 h 3 

\2j 2 1 4 2 8 3 

fh\ 1 1,1, 

n 3 J = M -3^-9* 2 * -27** — ■ 

so N(h)+aN(%) + bN(§) is 

(1 + a + b)M - (l + ^ + l) K ih- (l + ^ + ^Kzh 2 - (l+“ + ^jK 3 h 3 ---- . 
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Therefore, we need to find a and b such that 


1 + 


a 

2 


1 + 


a 

4 


b 

3 

b 

9 


0 

0. 


The solution of the system is a = — 8 and b = 9. Calculating, 


JV( ft )- 8 W (t) + 9 Jvg) 


= 2 M + 0(h 3 ) 


so our 0(h 3 ) estimate for M is 


N(h) — 87V(|) + 97V(|) 
2 


REMARK: We can work directly by Richardson’s extrapolation (at least to begin) as well. Using Richard- 
son’s extrapolation with a = | and mi = 1, we can combine N(h) and N(^) to get an 0{k 2 ) approxi- 
mation: 

N 1 (h)=2N(^j-N(h). 

Using Richardson’s extrapolation with a = | and mi = 1, we can combine N(^) and iV(|) to get 
another 0{H 2 ) approximation: 


Ni 



|jv(|)-7V(|) 

1 

2 



Both N-\ and N-\ are 0(h 2 ) approximations, so we can combine them to get the 0{h 3 ) approximation. 
Unfortunately, the Richardson’s extrapolation formula does not apply. It assumes the same constants in 
each approximation. But the general idea does. We need to combine these approximations 

iW(/i) = M + ^-K 2 h 2 + 7K3/1 3 + • • • 

2 4 

M£) = 

to eliminate the h 2 term. By inspection, we need 3Ni(^) — Ni(h): 

m Q) - N^h) = 2 M- * K 3 h 3 
Therefore, the 0(h 3 ) approximation for M we are looking for is 


N 2 (h) 


ZN^-N^h) 

2 

3[3A(|)-2AT(|)] - [2 N(^)-N(h)\ 


!V(/i)-8iV(§) + 97V(§) 


8: For the first extrapolation, we use formula 4.5.4 with a = ^ and mi = 2: 


N^h) 


41V(|) — N(h) 
3 


which leaves Ni(h) 
a — ^ and m 1 = 4: 


M + l 2 h 4 + Z3/1 6 + • • • . We get a second round of refinements from formula 4.5.4 with 


N 2 (h) = 


16iVi(|)-Ai(/i) 


15 
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which leaves N 2 (h) = M + c 3 h 6 + • • • . We get a third round of refinements from formula 4.5.4 with a = | 
and mi = 6: 

]\y / j. \ 64 JV 2 (t)-JV 2 (/0 

W) = 63 • 

Tabulating the computation, it goes something like this: 

N JVi 7V 2 W 3 

2.356194 

—0.4879837 -1.436042 

-0.8815732 -1.012769 -0.9845514 

-0.9709157 -1.000696 -0.9998916 -1.000135 

The third Richardson extrapolation is —1.000135. Not bad considering the exact value of the integral is —1. 

10 : To summarize the method, let No(k) = 1,3), the trapezoidal rule itself applied with k subintervals. Then 

since the error of the trapezoidal rule only contains even powers, 


Nj(k) 


4-?AT ? -_i(2fc) - Nj_i(fc) 
44—1 


for j = 1,2,.... To six significant figures, the following table summarizes the process. 


k 

N 0 (k) 

JVi(fc) 

N 2 (k) 

N 3 (k) 

N 4 (k) 

N 5 (k) 

N 6 (k) 

1 

-2.13074 

-0.837026 

-0.723655 

-0.705067 

-0.702555 

-0.702340 

-0.702330 

2 

-1.16045 

-0.730741 

-0.705358 

-0.702564 

-0.702340 

-0.702330 


4 

-0.838170 

-0.706944 

-0.702608 

-0.702341 

-0.702330 



8 

-0.739751 

-0.702879 

-0.702345 

-0.702330 




16 

-0.712097 

-0.702378 

-0.702330 





32 

-0.704808 

-0.702333 






64 

-0.702952 








To (Octave) machine accuracy, 


ln(sin(a;))(ia; 


-0.702330215031025 


Section 5.2 


2 : Since there are three points given, the spline consists of two cubic pieces. Each cubic piece has 4 coefficients, so 
we will need to construct a system of 8 equations in the 8 unknowns. The spline S takes the form 


S(x) 


Si(x) = cli + bi(x — 1) + Ci(x — l) 2 + di(x — l) 3 , x £ [0, 1] 
S 2 (x ) = a 2 + b 2 (x — 2) + c 2 (x — 2 ) 2 + d 2 (x — 2) 3 , x £ [1, 2] 


The 8 equations come from the three sets of requirements on any free cubic spline. 
Interpolation: 


• <Si(0) = —9 => aq — b± + ci — d\ = —9 
. Si(l) = -13 => ai = -13 

• <§ 2 ( 1 ) = —13 => a 2 — b 2 + c 2 — d 2 = —13 
. 5 2 (2) = -29 => a 2 = -29 


Derivative matching: 

• ^(l) = 5^(1) => bi = b 2 — 2c 2 + 3 d 2 
. 51' (1) = ^ 2 ( 1 ) =► 2ci = 2 c 2 - 6 d 2 


Endpoint conditions: 


284 


Solutions to Selected Exercises 


• S"(0) = 0 => 2ci - 6di = 0 

• .S'" (2) = 0 => 2c 2 = 0 


7: Since there are three points given, the spline consists of two cubic pieces. Each cubic piece has 4 coefficients, so 
we will need to construct a system of 8 equations in the 8 unknowns. The spline S takes the form 


S(x) 


S±(x) = ai + b\(x — 2) + Ci(x — 2) 2 + di(x — 2) 3 , x £ [1, 2] 
S 2 (x) = a 2 + b 2 (x — 4) + c 2 (x - 4) 2 + d 2 (x — 4) 3 , x £ [2, 4] 


The 8 equations come from the three sets of requirements on any clamped cubic spline. 
Interpolation: 


• ^^(l) — 1 CL\ — T C\ — d\ — 1 

• Si (2) = 3 => ai = 3 

• S 2 (2) = 3 — r* ci 2 — 26 2 + 4c 2 — 8 d 2 = 3 

• S 2 (4) = 2 => a 2 = 2 


Derivative matching: 

. S((2) = S' 2 ( 2) =+ 61 = b 2 - 4c 2 + 12d 2 
• S" (2) = S 2 (2) => 2ci = 2c 2 - 12 d 2 

Endpoint conditions: 

. S(( 1) = 0 => foi - 2ci + 3di = 0 

. S' (4) = 0 =+ b 2 =0 

9a: Following the solution outlined in the text, equation 5.2.8 gives n — 2 = 0 equations in the c^. Equation 5.2.11 
gives — 4ci — 2c 2 = 3 — 3Q , which simplifies to 

— 4ci — 2 c 2 = 36. 


Combined with the equation c 2 = 0, we find Ci = —9. Now we have the ai and q. The rest of the 
solution amounts to back-substitution. From the left endpoint condition, e?i = |ci = —3. From second 
derivative matching, d 2 = C2 ~ Cl = °~ = 3. Now we have the di. From the interpolation requirements, 

61 = ai + ci — d± + 9 and b 2 = a 2 + c 2 — d 2 + 13, so 

61 = -13-9 + 3 + 9= -10 
b 2 = -29 + 0-3 + 13 = -19. 


The spline is, therefore, 


S(x) 


-13 - 10(x - 1) - 9(x - l) 2 - 3(x - l) 3 , x £ [0, 1] 
—29 — 19(x — 2) + 3(x — 2) 3 , xe[l,2]. 


REMARK: The solution outlined in the text is not the only way to get the solution, 
the six equations involving bi, Ci, and di can be used. 


Any method of solving 


9e: Following the solution outlined in the text, equation 5.2.8 gives n — 2 = 0 equations in the c^. We can not 
use equation 5.2.11 since it was derived from free endpoint conditions. Instead, we need to use the clamped 
endpoint conditions to come up with two equations in the Cj. Equation 5.2.10 gives us b\ = + yr c i + qr C2 ’ 

Solving the second derivative matching equation for d 2l we have d 2 = C2 ~ Cl . Substituting expressions for &i, 
& 2 , and d 2 into the first derivative matching equation, 3^ — ~| c i ~ | c 2i which simplifies to 4 ci + 8c 2 = 3. This 
is our first equation in Cj. Now solving the left endpoint condition for d\, we have e?i = 2ci ~ 61 . Substituting 
expressions for a\, b\, and d\ into the first interpolation equation, we have 3 — (35 + ir c i + ~T C ^) + <+ — 
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2ci — ( 772 T "IT Cl ~^~ “ 3 ~ c 2 ) 

AT 

find Ci = — | and c-2 


1, which simplifies to llci + 4c2 = —21. The two equations in Cj can now be solved to 
hi . As with the free spline, the rest of the solution amounts to back-substitution: 


The spline is, therefore, 


h 

di 

d 2 


1 —4 / 5\ -2 /13\ _ 7 

^2 + T\2) + T{y) ~ 4 

2(-l) - \ 9 

3 4 

T -H) _ 11 

6 16' 


3+f(x-2)-f(x-2) 2 -f(x-2) 3 , xG [1,2] 
2+f(x-4) 2 + ii(x-4) 3 , x G [2, 4], 


REMARK: The solution outlined in the text is not the only way to get the solution. Any method of solving 
the six equations involving bi, Ci, and di can be used. 


10a: >> [a,b , c , d] =naturalCubicSpline( [0 , 1 , 2] , [-9 , -13 , -29] ) 
a = 

-13 -29 

b = 

-10 -19 

c = 

-9 0 

d = 

-3 3 


11: First, the declaration of the function must be changed. Left and right endpoint derivatives, uiq and m n , will 
be specified, so there must be additional arguments to the function. Also, the name of the function should be 
changed: 


function [a,b,c,d] = naturalCubicSpline(x,y) 

should become 

function [a,b,c,d] = clampedCubicSpline(x,y,mO,mn) 


The rest of the modifications involve the endpoint conditions and their effect on the equations within the 
function. We begin by solving the left endpoint condition for d\: bi + 2c\hi + 2>d\h\ = m 0 => 


di 


mo — bi — 2c\hi 

3 h\ 


(6.5.6) 


Substituting this equation, cii = yi, and equation 5.2.10 into 5.2.1 with i = 1 gives 


Vi 


^ — I - 2^ 2Cl + 3^ 2C2 ) ^ ll Cl ^i 


m 0 - + f *201 + \h 2 c^j - 2c\h\ 


3 hi 


h\ = y 0 , 
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which simplifies as follows. 

= Vo — 2/i 

Vo ~ Vi 
hi 

_ O yo - yi 

hi 

_ O yo - yi 

hi 

n 2 /o - yi 
n yo - yi 

hi ’ 
(6.5.7) 

The right endpoint condition, S' n (x n ) = m n gives b n = m n . Substituting this information into 5.2.7 with 
i = n gives m n = Vn -^~ Vn — , which simplifies to 

hnCn—l + 2h n C n = 3 — — - m n S j • (6.5.8) 

Equation 6.5.7 should be reflected in the modified code on lines 21 and 22: 

m(l, l)=2*(h(l)+h(2)) ; m(l,2)=h(2) ; 

m(l ,n+l)=3* ( (y (l)-y (2) )/h(l)-(y (2)-y (3) )/h(2) ) ; 

becomes 

m(l,l)=3*h(l)+4*h(2) ; m(l,2)=2*h(2) ; 

m(l ,n+l)=9*(y(l)-y(2) )/h(l)-6*(y(2)-y(3) )/h(2)-3*m0; 

Equation 6.5.8 should be reflected in the modified code on line 25: 

m(n,n-l)=0; m(n,n)=l; m(n,n+l)=0; 

becomes 

m(n,n-l)=h(n) ; m(n,n)=2*h(n) ; m(n,n+l)=3*((y(n)-y(n+l))/h(n)-mn) ; 

The solution for the c, remains unchanged. We have only left to modify the computation of bi and di on lines 
47 and 48. &i now comes from 5.2.10, so 

b(l)=(y(l)-y(2))/h(l)-2*c(l)*h(l)/3; 

becomes 

b(l)=(y(2)-y(3))/h(2)+2*c(l)*h(2)/3+h(2)*c(2)/3; 
di now comes from 6.5.6, so 
d(l)=-c(l)/ (3*h(l)) ; 


“ — ^ + 2^ 2Cl + ) hi + CiK{ + 


mo ~ ( Vl h2 2 I^ 2Cl \h 2 C 2 ^ — 2cih\ 


hi 


yi-y 2 2, 1, , m 0 - + lh 2 ci + \h 2 C 2 ) - 2cihi 

— t 1" oh 2 Ci + -h 2 c 2 + Cihi H 

ri2 3 3 3 


3^-r — — + 2h 2 ci + h 2 c 2 + 3cihi + m 0 — 


Vl t V2 + \h 2 ci + ~ h 2 c 2 j + -2cihi 
h 2 3 3 J 

2 — — — + 2h 2 ci + h 2 c 2 + Cihi + m 0 — ( - h 2 Ci + -h 2 c 2 
h 2 V 3 3 

6 '^ 2 + 6h 2 Ci + 3h 2 c 2 + 3cihi + 3 m 0 — (2h 2 Ci + h 2 c 2 ) 

h 2 

g Ji — 1J2_ _|_ 2h 2 c 2 + 3cihi + 3mo 


and finally 


(4/i 2 + 3/ii) ci + 2h 2 c 2 = 9 — — — - 6 — — — - 3m 0 . 

hi h 2 


becomes 
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d(l)=(m0-b(l)-2*c(l)*h(l))/(3*h(l)~2) ; 

Of course, the comments at the beginning of the function should be updated as well. The modified code, 
then, should look something like this: 


mmmmmmmmmmmnnmmmm 


7 Written by Dr. Len Brin 3 June 2014 7 , 
7 Purpose: Calculation of a natural cubic "/« 
*/. spline. */. 
•/. INPUT: points (x(l),y(l)), (x(2),y(2)), ... “/„ 
7 spline must interpolate; first “/« 
•/. derivative at left endpoint, mO; first ”/ 0 
•/. derivative at right endpoint, mn. ”/ 0 
*/. OUTPUT: coefficients of each piece of the ”/ 0 
7 piecewise cubic spline: ”/ 0 
7 S(i ,x) = a(i) 7 
7 + b(i)*(x-x(i+l)) 7 
7 + c(i) * (x-x(i+l) ) ~2 7 
7 + d(i)*(x-x(i+l))~3 7 


717 7777777777777777777777777777777777777777777777 


function [a,b,c,d] = clampedCubicSpline(x,y,mO,mn) 
n=length(x) -1 ; 
for i=l:n 


h(i)=x(i)-x(i+l) ; 
end%f or 

7 Left endpoint condition: 

7 m(l,l)*c(l) + m(l,2)*c(2) = m(l,n+l) 

m(l,l)=3*h(l)+4*h(2) ; m(l ,2)=2*h(2) ; 

m(l ,n+l)=9*(y(l)-y(2))/h(l)-6*(y(2)-y(3))/h(2)-3*m0; 

7 Right endpoint condition: 

7 m(n,n-l)*c(n-l) + m(n,n)*c(n) = m(n,n+l) 

m(n,n-l)=h(n) ; m(n,n)=2*h(n) ; m(n,n+l)=3*((y(n)-y(n+l))/h(n)-mn) ; 
7 Conditions for all splines: 
for i=2:n-l 


m(i , i-l)=h(i) ; 

m(i , i)=2* (h(i)+h(i+l) ) ; 

m(i , i+l)=h(i+l) ; 

m(i,n+l)=3*( (y(i)-y(i+l))/h(i)-(y(i+l)-y(i+2))/h(i+l) ) ; 
end%f or 

7 Solve for c(i) 

1(1) =m (1,1) ; u(l)=m(l,2)/l(l) ; z(l)=m(l,n+l)/l(l) ; 
for i=2:n-l 


l(i)=m(i,i)-m(i,i-l)*u(i-l) ; 
u(i)=m(i , i+l)/l (i) ; 

z(i)=(m(i,n+l)-m(i , i-l)*z(i-l))/l(i) ; 
end%f or 

l(n)=m(n,n)-m(n,n-l)*u(n-l) ; 
c(n)=(m(n,n+l)-m(n,n-l)*z(n-l) )/l(n) ; 
for i=n-l:-l:l 

c(i)=z(i)-u(i)*c(i+l) ; 
end%f or 

7 Compute a(i) , b(i), d(i) 

7 Endpoint conditions: 

b(l)=(y(2)-y(3))/h(2)+2*c(l)*h(2)/3+h(2)*c(2)/3; 
d(l)=(m0-b(l)-2*c(l)*h(l))/(3*h(l)~2) ; 

•/. Conditions for all splines: 
a(l)=y (2) ; 
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for i=2:n 

d(i)=(c(i-l)-c(i))/(3*h(i)) ; 

b(i)=(y(i)-y(i+l))/h(i)-(c(i-l)+2*c(i))*h(i)/3; 
a(i)=y(i+l) ; 
end%f or 
b(n)=mn; 
end%f unction 

Notice the addition of the final computation, b(n)=mn. The value of b(n) from the loop is subject to 
floating point error. Setting b n equal to m n at the end of the program eliminates this potential variation. 
clampedCubicSpline ,m may be downloaded at the companion website. 

12b: >> [a,b, c , d] =clampedCubicSpline ( [1 , 2,4] , [1 , 3 , 2] , 0 , 0) 
a = 

3 2 

b = 

1.75000 0.00000 

c = 

-2.5000 1.6250 

d = 

-2.25000 0.68750 

Section 2.7 

1: (c) g(a ) = <?(0) = 2 and g(b) = g(. 9) = —.1897 so the bracket is good. Moreover, we now know that if the 
value of the function is positive at any given iteration, that iteration becomes the left endpoint. Otherwise it 
becomes the right endpoint. Recall, the secant method when applied to a proper bracket will always produce 
an iteration inside the bracket, so bisection is never needed. 


a 

b 

candidate x 

X 

9(x) 

x becomes 

0 

0.9 

0.82203 

0.82203 

-0.207 

b 

0 

0.82203 

0.74486 

0.74486 

-0.137 

b 

0 

0.74486 

0.69690 

0.69690 

-0.060 

b 

0 

0.69690 

0.67660 

0.67660 

-0.020 

b 

0 

0.67660 

0.66971 





|.66971 — .67660| = .00689 < .01 so we stop with x 5 = .66971. 

(h) /(a) = /(— 20) ~ 20 and fib) = /( 20) « —17 so the bracket is good. Moreover, we now know that if the 
value of the function is positive at any given iteration, that iteration becomes the left endpoint. Otherwise it 
becomes the right endpoint. Recall, the secant method when applied to a proper bracket will always produce 
an iteration inside the bracket, so bisection is never needed. 


a 

b 

candidate x 

X 

9(x) 

x becomes 

20 

-20 

1.5262 

1.5262 

1.18 

a 

1.5262 

-20 

2.7013 

2.7013 

-1.16 

b 

1.5262 

2.7013 

2.1186 

2.1186 

0.229 

a 

2.1186 

2.7013 

2.2142 

2.2142 

0.011 

a 

2.2142 

2.7013 

2.2189 





1 2.2189 — 2.2142| = .0047 < .01 so we stop with 2:5 = 2.2189. 

2: (c) g(a) = g( 0) = 2 and g(b) = g(. 9) = —.1897 so the bracket is good. Moreover, we now know that if the 
value of the function is positive at any given iteration, that iteration becomes the left endpoint. Otherwise it 
becomes the right endpoint. An * indicates that the bisection method was used due to the candidate landing 
outside the bracket. 
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a 

b 

candidate x 

X 

g(x) 

x becomes 

0 

0.9 

1.1136 

0.45* 

0.59 

a 

0.45 

0.9 

0.63925 

0.63925 

0.060 

a 

0.63925 

0.9 

0.66547 

0.66547 

0.0025 

a 

0.66547 

0.9 

0.66666 





|.66666 — .66547| = .00119 < .01 so we stop with x\ = .66666. 

(h) /(a) = /(— 20) ss 20 and f(b) = /( 20) « —17 so the bracket is good. Moreover, we now know that if the 
value of the function is positive at any given iteration, that iteration becomes the left endpoint. Otherwise it 
becomes the right endpoint. An * indicates that the bisection method was used due to the candidate landing 
outside the bracket. 


a b candidate x x g(x) x becomes 
"20 = 20 1062)3 0* I a 

0 —20 undefined 

The method is undefined beyond this point due to division by zero. The method fails. 

REMARK: We will see later (question 6h) that Octave is able to handle the division by zero well enough 
that the method does continue, and eventually arrives at a solution! 

3: (c) The secant method produces the sequence of approximations 

0, .9, .82203, 1.7456, .83551, .84905, -1.6288, .83478, 

.82068,, .14336, .74168, .69475, .66071, .66700 

at which point it stops since | .66700 — .66071 1 = .00629 < .01. The (pure) secant method takes significantly 
longer to converge than does its bracketed cousin. This is largely due to the fact that in the secant method, 
the third iteration comes from the secant method applied to .9 and .82203, the last two iterations (which do 
not comprise a proper bracket), whereas the third iteration in false position comes from the secant method 
applied to 0 and .82203 (a proper bracket). 

(h) The secant method produces the sequence of approximations 

-20, 20, 1.5262, 2.7013, 2.1186, 2.2142, 2.2192 

at which point it stops since 1 2.2192 — 2.2142 1 = .005 < .01. The (pure) secant method and its bracketed 
cousin produce the exact same sequence of iterations. It just happens that, at each step, the secant method 
produces an approximation, which when paired with the previous iteration forms a proper bracket! 

4: (c) Newton’s method produces the sequence of approximations 

.9, 1.1136, 1.0302, 1.0030, 1.0000 

at which point it stops since |1 — 1.003| = .003 < .01. The (pure) Newton’s method converges to a different 
root, one outside the bracket! It is quick, but it fails to produce a root between 0 and .9, something that 
should not be surprising from an un-safeguarded method. 

(h) Newton’s method produces the sequence of approximations 

20, 1062.3, 3803.0, 971.14, 377.14, 2880.5, 1606.3, 330.83, 66.635, 20.301, 

-5.5823, -21.983, -10.454, -4.6688, 1.9357, 2.2550, 2.2193, 2.2191 

at which point it stops since |2.2191 — 2.2193| = .0002 < .01. The (pure) Newton’s method takes significantly 
longer to converge than does its bracketed cousin! Newton’s method is allowed to wander in a seemingly 
random pattern before it comes close enough to the root to converge. Bracketing forces the iterations to 
approach much more quickly the interval in which Newton’s method will converge. 

1. Use the bracketed secant method (false position) to find a root in the indicated interval, accurate to within 

1CT 1 2 . 
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(a) f(x) = 3 — x — sin x; [2, 3] [ A 1 

(b) g(x) = 3a: 4 — 2x 3 — 3x + 2; [0, 1] 

(c) g( x) = 3a; 4 - 2a; 3 - 3a; + 2; [0, 0.9] ^ 

(d) h(x) = 10 — cosh(a:); [—3, —2] 

(e) f(t) = v^ + Ssint - 2.5; [-600, -500] [ A ] 

(f) g ( t ) = 3 [3490,3491] 

(g) h(t) = ln(3 sin t) - f; [1,2] 

(h) f(r) = e sinr — r; [-20,20] ^ 

(i) g(r) = sin(e r ) + r; [-3,3] 

(j) h{r ) = 2 sinr - 3 cosr ; [1,3] W 

5: (c) 


» f 2=inline ( ’ 3*x~4-2*x~3-3*x+2 ’ ) ; 

>> [res,i]=falsePosition(f2,0, .9, 10^-6,100) 
b = 0.900000000000000 

b = 0.822030415125360 

b = 0.744866113620209 

b = 0.696903242045358 

b = 0.676602659540989 

b = 0.669712929388636 

b = 0.667578776723430 

b = 0.666937771712738 

b = 0.666747069128180 

b = 0.666690496216585 

b = 0.666673727853602 

b = 0.666668758921090 

b = 0.666667286598371 

res = 0.666666850350527 

i = 13 


so a;i3 = 0.666666850350527 is expected to be within 10 6 of the actual root, 

(h) 


>> f7=inline( ’exp(sin(r))-r 1 ) ; 

>> [res , i] =f alsePosition(f 7 , -20 , 20, 10~-6 , 100) 
b = 20 

b = 1.52625394347853 

b = 2.70134274226916 

b = 2.11862078217644 

b = 2.21421804475756 

b = 2.21893051185485 

b = 2.21910087293432 

b = 2.21910692606145 

res = 2.21910714100071 

i = 8 


so a;8 = 2.21910714100071 is expected to be within 10 6 of the actual root. 
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» f 2=inline ( ’ 3*x~4-2*x~3-3*x+2 ’ ) ; 

» f2p=inline(’12*x~3-6*x~2-3’) ; 

» [res,i]=bracketedNewton(f2,f2p,0, .9, 10~-6, 100) 
b = 0.900000000000000 

b = 0.450000000000000 

b = 0.639257968925196 

b = 0.665474256136936 

b = 0.666663994320019 

b = 0.666666666653136 

res = 0.666666666666667 

i = 6 

so xq = 0.666666666666667 is expected to be within 10 -6 of the actual root. 

(h) 

» f 7=inline ( ’ exp(sin(r) )-r ’ ) ; 

» f 7p=inline ( ’exp(sin(r) )*cos(r)-l’) ; 

» [res , i] =bracketedNewton(f 7 ,f 7p , -20 , 20 , 10~-6 ,100) 
b = 20 

b = 0 

warning: division by zero 
b = 10 

b = 3.66539525575696 

b = 1.65966535497164 

b = 2.50454805267468 

b = 2.22298743934113 

b = 2.21911019802387 

b = 2.21910714891565 

res = 2.21910714891375 

i = 9 

so xg = 2.21910714891375 is expected to be within 10 -6 of the actual root. 

REMARK: When we tried to compute this solution by hand (question 2h), we quit after the first iteration 
due to the division by zero. However, Octave continues, treating the undefined estimate as one that 
lands outside the bracket. Thus the second iteration is 10 (the bisection method applied to [0,20]). 


7 : (c) 


» f 2=inline ( ’ 3*x~4-2*x~3-3*x+2 ’ ) ; 

» [res , i] =bracketedInverseQuadratic (f 2, 0 , . 9, 10~-6 , 100) 
b = 0.900000000000000 

b = 0.822030415125360 

b = 0.411015207562680 

b = 0.729556813485380 

b = 0.629464108906733 

b = 0.671561434924253 

b = 0.666977335665865 

b = 0.666666168461076 

res = 0.666666666960237 

i = 8 


so x 8 = 0.666666666960237 is expected to be within 10 6 of the actual root. 

(h) 
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>> f7=inline( ’exp(sin(r))-r ’ ) ; 

>> [res , i] =bracketedInverseQuadrat ic (f 7 , -20 , 20 , 10~-6 ,100) 
b = 20 

b = 1.52625394347854 

b = 2.70134274226916 

b = 2.11862078217644 

b = 2.21421804475756 

b = 2.21917736990638 

b = 2.21910707796098 

res = 2.21910714891272 

i = 7 

so X 7 = 2.21910714891272 is expected to be within 10 -6 of the actual root. 

10 : (c) 

» f 2=inline ( 1 3*x~4-2*x~3-3*x+2 ’ ) ; 

>> g2=inline( ’f2(x)+x’ ) ; 

>> [res , i] =bracketedSteff ensens (g2, 0 , . 9 , 10~-6 , 100) 
b = 0.900000000000000 

b = 0.559577120523157 

b = 0.707986331365555 

b = 0.669737865924576 

b = 0.666686284030401 

b = 0.666666667476795 

res = 0.666666667666825 

i = 6 

so xq = 0.666666667666825 is expected to be within 10 -6 of the actual root, 

(h) 

>> f7=inline( ’exp(sin(r))-r ’ ) ; 

>> g7=inline( ’f7(x)+x’ ) ; 

>> [res, i]=bracketedSteff ensens (g7, -20, 20, 10~-6, 100) 
b = 20 

b = 1.80564417969925 

b = 2.18151287547235 

b = 2.21873144340028 

b = 2.21910711013891 

res = 2.21910707929096 

i = 5 

so x$ = 2.21910707929096 is expected to be within 10 -6 of the actual root. 

13 : (c) 

» f 2=inline ( ’ 3*x~4-2*x~3-3*x+2 ’ ) ; 

>> [res , i] =bracketedInverseQuadraticRE(f 2 , 0 , . 9 , 10~-6 , 100) 
b = 0.900000000000000 

b = 0.822030415125360 

b = 0.411015207562680 

b = 0.729556813485380 

b = 0.629464108906733 

b = 0.671561434924253 

b = 0.666977335665865 
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b = 0.666666168461076 

res = 0.666666666960237 

i = 8 

so x s = 0.666666666960237 is expected to be within 10 -6 of the actual root, 

(h) 

» f 7=inline ( ’ exp(sin(r) )-r ’ ) ; 

» [res , i] =bracketedInverseQuadraticRE(f 7 , -20 , 20 , 10~-6 , 100) 
b = 20 

b = 1.52625394347854 

b = 2.70134274226916 

b = 2.11862078217644 

b = 2.21421804475756 

b = 2.21917736990638 

b = 2.21910707796098 

res = 2.21910714891272 

i = 7 

so xt = 2.21910714891272 is expected to be within 10 -6 of the actual root. 


Section 6.1 


Id: The degree of the differential equation equals the degree of the highest degree derivative in the equation. The 
only appearance of a derivative in the equation is the f term. That makes the highest degree derivative 1, 
so the degree of the differential equation is 1. 

2d: In the differential equation /' + £ = x 2 , both / and /' appear. To verify that a given function / is a solution, 
we need to substitute both / and /' into the equation, f is not given, so we calculate it: 


/'(*) = 


3a; 2 4 

4 x 2 


Now that we have everything needed, we substitute / and f into the differential equation and verify that the 
equation is true. Substituting: 

3£i_T) + (iri) = 

4 a , ) x 


It is not obvious that this equation is true, so we need to do a little work. To finish the verification, we must 
show that the two sides are equal using algebra. Adding or subtracting or doing anything else to both sides 
simultaneously supposes that the two sides are equal, so these things are not allowed! Instead, we need to 
manipulate the two sides separately. Working with the left side only: 

/ 3a; 4 16 \ /a; 3 4 \ 2 

\4a; 2 4a; 2 / \4a; + x 2 ) X 

3a: 4 16 x 4 16 2 

4a: 2 4a: 2 4a; 2 4a; 2 X 

4a; 4 2 

, ,, = ar. 

4a; 2 

Almost done, but technically, this equation is not true! It is false when x = 0 because the left side is undefined 
for x = 0. Luckily we do not have to worry about that case. It was given that x > 0, so we know and 

we can reduce to a; 2 , which finishes the verification. 

3d: In order to verify that a function is a solution of an initial value problem, we need to verify that it solves the 
differential equation and satisfies the initial value requirement. 
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Showing that f(x) = \ x > 0, is a solution of f = — | +x 2 : In the differential equation /'+ ^ = x 2 , 
both / and /' appear. To verify that a given function / is a solution, we need to substitute both / and 
/' into the equation. /' is not given, so we calculate it: 


/'(*) = 


3x 2 


16 

T2 ■ 


4 x * 

Now that we have everything needed, we substitute / and /' into the differential equation and verify 
that the equation is true. Substituting: 


3x 2 

~4~ 


16 


(£ + ?) 


= x 


It is not obvious that this equation is true, so we need to do a little work. To finish the verification, we 
must show that the two sides are equal using algebra. Adding or subtracting or doing anything else to 
both sides simultaneously supposes that the two sides are equal, so these things are not allowed! Instead, 
we need to manipulate the two sides separately. Working with the left side only: 


3.t 4 64 

4x 2 4x 2 / 

3x 4 64 

4a : 2 4a ; 2 


x 

4x 

x 4 

4x 2 


16 

a ; 2 
64 
4a ; 2 
4a ; 4 
4a ; 2 


= x 


Almost done, but technically, this equation is not true! It is false when x = 0 because the left side is 
undefined for x = 0. Luckily we do not have to worry about that case. It was given that x > 0, so we 
know x 7 ^ 0 and we can reduce to x 2 , which finishes the verification. 

Showing that /( 4) = 20: To show that / satisfies the initial value requirement, we simply compute /( 4) 
and show that it is 20 as required. /( 4) = x + T = T + T = 


4c: The given y = t — sinf can be restated as y'{t) = t — sinf. In other words, we are given the derivative of y as 
a function of t. The fundamental theorem of calculus tells us that y must be the integral (antiderivative) of 
the given function. That is, 

y(t) = y (t — sin t)dt 
= ^t 2 +cost + C. 

So the (infinitely many) solutions of the o.d.e. are y(t) = \t 2 + cos t + C. 

5d: Though we could give them, this question is not asking for exact measurements of the error. It is simply 
requesting a comment on the accuracy of the approximate solution. It will suffice to compare the graphs of 
the exact solution and approximate solution over the interval covered by the approximate solution, [4,5], and 
do a calculation or two. The graph of the exact solution is a graph of the function f(x) = ^ and the 

graph of the approximate solution is a graph of the set {(4, 20), (4.25, 23), (4.5, 26), (4.75, 30), (5, 34)}: 
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From the graphs, the only point in the approximation that is visually separate from the graph of the exact 
solution is the point (5,34). And it only misses by a small relative amount. To be more precise, the relative 
error there is = gig ~ 0.013. Any general comment on the accuracy of an approximation should 

take into account the requirements of the situation. In this case, there is no context to say whether we should 
hope for 10%, 1%, .1%, or smaller relative error or whether we should be more concerned about absolute 
error. Without any such context, we will simply use the visual representation, which shows the points of 
the approximation very close to the graph of the exact solution, and conclude the approximation is a good 
representation of the exact solution. 

6c: The forces acting on a stationary block on an inclined plane are gravity, friction, and the normal force of the 
surface on which it is lying. Gravity acts vertically downward. Friction acts parallel to the surface and up the 
slope since it is resisting gravity which pulls the block down the slope. The normal force acts perpendicular to 
the surface. Representing the block as a rectangle and each force by a vector, the free body diagram should 
look something like this: 



Note that the line representing the surface is NOT part of the free body diagram, so it is dashed. It is only 
there to show the (potential) direction of motion. 

6f: The forces acting on a sofa being pushed across a level floor are gravity, friction, the normal force of the floor, 
and the applied force. Gravity acts veritcally downward. Friction acts parallel to the floor opposing the 
applied force. The normal force acts perpendicular to the floor. And the applied force acts in an unspecified 
direction not parallel to the floor. Representing the sofa as a rectangle and each force by a vector, the free 
body diagram should look something like this: 


1 N 


F 


applied 


F, 


friction 


mg 


Note that the line representing the floor is NOT part of the free body diagram, so it is dashed. It is only 
there to show the direction of motion. 

6m: The forces acting on a sky diver — whether his parachute is open, closed, or in the process of opening does 
not matter — are gravity and drag (air resistance). Gravity acts vertically downward and drag acts vertically 
upward. Representing the sky diver as a rectangle and each force by a vector, the free body diagram should 
look something like this: 



Fdrag 



, mg 
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7c: (See solution of 6c for free body diagram) Since the block is not moving, the net force in any direction must 
be zero! That makes the equation of motion s(t) = 0. The end. This answers the question asked. 

In a situation where the block is moving, however, it is necessary to consider the magnitudes of the forces 
acting in the direction of motion, friction and gravity. For sake of discussion, here is how they may be resolved. 
The normal force acts normal to the motion so has zero tangential component. Friction is proportional to 
the normal force, and by convention we use g for the constant of proportionality, so the magnitude of friction 
is gN. Adding an auxiliary line perpendicular to the surface, we see that the component of gravity in the 
tangential direction is mg sin a. 



Taking the positive direction to be down the slope, the forces acting tangential (parallel) to the surface are 
mg sin a — /. iN . To complete the equation of motion, we need to compute N. Since the block does not move 
in the normal direction, the net force in that direction must be zero. The only forces acting in the normal 
direction are the normal force itself and a component of gravity. Therefore, N must equal the magnitude 
of gravity in the normal direction. Again using the auxiliary line, the component of gravity in the normal 
direction is mg cos a. Hence N = mg cos a. Substituting this expression into the tangential forces, we have 
mg sin a — gmg cos a acting tangential to the surface. By Newton’s Second Law, this force must equal ma , so 
the equation of motion is ms = mg sin a — fimg cos a, which simplifies to 

s = g( sin a — g cos a) . 

This equation can be used for a block in motion down an inclined plane. 

7f: (See solution of 6f for free body diagram) Both gravity and the normal force act normal to the motion, so have 
zero tangential components. The only forces that act (with nonzero component) in the direction of motion 
are friction and the applied force. Friction is proportional to the normal force, and by convention we use g 
for the constant of proportionality, so the magnitude of friction is gN. Adding an auxiliary line parallel to 
the surface, we mark the angle of the applied force and see that the component of the applied force in the 
tangential direction is F app u ed cos (3. 


N 


friction 



mg 


Taking the positive direction to be left, the forces acting tangential (parallel) to the surface are F app i ied cos/3 — 
gN. To complete the equation of motion, we need to compute N. Since the block does not move in the 
normal direction, the net force in the normal direction must be zero. The forces acting in that direction are 
N itself, gravity, and a component of the applied force. Therefore, in the normal direction, we must have 
N + F app u e d sin /3 = mg or N = mg — F app n e d sin /3. Substituting this expression into the tangential forces, 
we have F app i ie d cos /3 — g{mg — F app i ied sin j3) acting tangential to the surface. By Newton’s Second Law, this 
force must equal ma, so the equation of motion is ms = F app u ed cos /3 — g(mg — F app u e d sin j3), which simplifies 
to 


F, 


applied 


(cos P + ft sin /3) — jig. 


m 
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7m: (See solution of 6m for free body diagram) Both forces in the free body diagram act in the vertical direction, so 
the equation of motion is particularly simple in this case. No trigonometry is needed. F = ma simply becomes 
Fdrag ~ rug = ms, taking upward to be the positive direction. The drag force is taken to be proportional to 
speed but in the opposite direction, so Fd ra g may be replaced by — cs (for some positive constant c) and the 
equation of motion becomes, more precisely, — cs — mg = ms. With a little bit of algebra, this equation can 
be rewritten as 

s + — s + a = 0. 
m 

Section 6.2 

la: Replacing the t in Euler’s Method (6.2.3) by x, Euler’s Method applied to this problem has the form 2/i+i = 
Hi + h ■ y'{xi, y,'). Because the initial condition is y( 1) = 1, we begin with Xq = 1 and yo = 1. Then 

Vi = Vo + 0.5(3x o - 2 y 0 ) 

= 1 + 0. 5(3(1) — 2(1)) 

= 1.5 

x\ = xq + h = 1 + 0.5 = 1.5 

Now Xq and yo can be forgotten as we compute x 2 and y 2 : 

2/2 = 2/i + 0.5(3®! - 2 i/i) 

= 1.5 + 0.5(3(1.5) -2(1.5)) 

= 2.25 

x 2 = X\ + h = 1.5 + 0.5 = 2.0 

Therefore, we have y( 2) « 2.25. 

Id: Because the o.d.e. is not written in the form y' = f(t,y), it is our job to rewrite it in that form, taking what 
is given and solving for y'\ 

cos {x)y r + sm(x)y = 2 cos 3 (2;) sin(a:) — 1 

cos {x)y = 2cos 3 (:r) sin(:r) — 1 — sin {x)y 

. 2cos 3 (a;) sin(x) — 1 — sin (x)y 

y = 

cos(x) 

= 2cos 2 (x) sin(x) — sec(x) — ytan(x) 

So we have f{x,y) = 2cos 2 (x) sin(x) — sec(x) — ytan(x). Now replacing the t in Euler’s Method (6.2.3) by x, 
Euler’s Method applied to this problem has the form j/j+i = y,; + h ■ y'(xi,yi). Because the initial condition 
is y( 1) = 0, we begin with Xq = 1 and yo = 0. Then 

2/i = 2/0 + 0.5/(xo, 2/0) 

= 0 + 0.5/(l, 0) 

= 0.5(2 cos 2 (l) sin(l) — sec(l)) 

« -0.67976011062352 

x\ = xo + h = 1 + 0.5 = 1.5 

Now Xo and 2/0 can be forgotten as we compute x 2 and y 2 \ 

2/2 = 2/i + 0.5 /(xi,2/i) 

« -0.67976 + 0.5/(1.5,-0.67976) 

« -2.9503939532546 
x 2 = x\ + h = 1.5 + 0.5 = 2.0 

Therefore, we have y{ 2) ~ —2.9503939532546. 
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2a: For Taylor’s Method of degree 2, we will need the second derivative of y. The only thing we have to work with 
is the o.d.e. itself, ^ = 3x — 2 y. By implicit differentiation, 

<Py_ = o _ 2 dy 

dx 2 dx 


However, this does not give us y" in terms of x and y. We must substitute ^ in terms of x and y. But that’s 

d d 2 

exactly what the o.d.e. tells us! Substituting -^ = 3x — 2y into the expression for (pf yields 


fy 

dx 2 


3 — 2(3x — 2 y) 
3 — 6a: + 4 y. 


Now we are ready. Symbolically, Taylor’s Method of degree 2 is 

Vi+i = Vi + h ■ y'ixi^i) + ^h 2 ■ y"(xi,yi) 

x i+1 = Xi + h 


Beginning with the initial conditions, xo = yo = 1, 

2/1 = yo + h- y'(x 0 , y 0 ) + ^ h 2 • y"(x 0 , y 0 ) 

= 1 + 0.5(3 • 1 - 2 • 1) + i(0.5) 2 • (3 - 6 • 1 + 4 • 1) 

= 1.625 

X\ = xo + h = 1 + 0.5 = 1.5 
Now Xo and yo can be forgotten as we compute X 2 and y 2 '- 

2/2 = 2 /i + h ■ 2 /'(xi, 2 /i) + ^/i 2 • y" (£ 1 , 2 / 1 ) 

= 1.625 + 0.5(3 • 1.5 - 2 ■ 1.625) + ^(0.5) 2 • (3 - 6 ■ 1.5 + 4 • 1.625) 

= 2.3125 

x\ = Xo + h = 1.5 + 0.5 = 2.0 
Therefore, we have y(2) = 2.3125. 

2d: For Taylor’s Method of degree 2, we will need the second derivative of y. The only thing we have to work 
with is the o.d.e. itself (after it’s been solved for 4^: 4 s = 2 cos 2 (a:) sin(a;) — sec(a;) — j/tan(a;). By implicit 
differentiation, 


d 2 y 

dx 2 


— tan(a;) 


dy 

dx 


sec (a;) tan (a;) — 4 cos (a;) sin 2 (a;) 


y sec 2 (a;) + 2 cos 3 (a;). 


However, this does not give us y" in terms of x and y. We must substitute (p in terms of x and y. But that’s 
exactly what the o.d.e. tells us! Substituting = 2cos 2 (a;) sin(a;) — sec(a:) — y tan (a:) into the expression for 

t2 

4-| (and simplifying a lot!) yields 


d 2 y 

dx 2 


= — y + 8 cos 3 (a:) — 6 cos(a:) 


Now we are ready. Symbolically, Taylor’s Method of degree 2 is 


2/i+i 


yi + h- y\xi,yi) + ^ h 2 ■ y"(x i ,y l ) 
Xi + h 
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Beginning with the initial conditions, Xq = 1, yo = 0, 

2 /i = 2 /o + h ■ y\x 0 ,yo) + ■ y"{x 0 ,y 0 ) 

= 0 + 0.5(2 cos 2 (l) sin(l) — sec(l)) 

+ ^(0.5) 2 • (8cos 3 (1) — 6cos(l)) 

« -0.92725823477363 
Xi = xq + h = 1 + 0.5 = 1.5 

Now Xq and yo can be forgotten as we compute X 2 and y 2: 

2/2 = 2/1 + h ■ y'(xi,yi) + T^h 2 ■ 2/"(xi,2/i) 

« -0.9272 + 0. 5/(1. 5, -0.9272) + ^(0.5) 2 • y"(1.5, -0.9272) 

w -1.3896462555267 
Xi = Xq + h = 1.5 + 0.5 = 2.0 

Therefore, we have y( 2) = —1.3896462555267. If this exercise does not convince you that Taylor’s Methods 
of degree higher than 2 are not particularly user-friendly, just wait until you try Taylor’s Method of degree 3 
on this problem. 

3a: For Taylor’s Method of degree 3, we will need the second and third derivatives of y. The only thing we have 
to work with is the o.d.e. itself, = 3x — 2 y. By implicit differentiation, 

d 2 y = dy 

dx 2 dx ’ 

However, this does not give us y" in terms of x and y. We must substitute ^ in terms of x and y. But that’s 

d d ' 2 

exactly what the o.d.e. tells us! Substituting gf = 3x — 2y into the expression for yields 

0 = 3 — 2(3ie — 2y) 

= 3 — 6a: + Ay. 

d 2 

Implicitly differentiating the equation for dJ s ives 

pL = -6 + 4 -^ 

dx 6 dx 

= — 6 + 4(3ie — 2 y) 

= 12x — 8y — 6. 

Now we are ready. Symbolically, Taylor’s Method of degree 3 is 

Vi+i = Vi + h ■ y'{xi,yi) +^h 2 ■ y"(x i ,y l ) + ^h 3 ■ y"'(xi,yi) 

Xi + 1 = Xi + h 

Beginning with the initial conditions, xq = yo = 1, 

2/i = 2/o + h ■ y'{x 0 , yo) + ^h 2 ■ y"{x Q , y 0 ) + ^ h 3 • y'"{x 0 , y 0 ) 

= 1 + 0.5(3 • 1 - 2 • 1) + * (0.5) 2 • (3 - 6 ■ 1 + 4 • 1) 

+ ^(0.5) 3 (12 • 1 — 8 • 1 — 6) 

« 1.5833333333333 
X\ = Xq + h = 1 + 0.5 = 1.5 
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Now xq and yo can be forgotten as we compute X 2 and 1/2 : 

V 2 = Vi + h , • 2/'(aTi, 2/1) + i/1 2 • y"{x 1: y!) + ^h 3 • y"'{x 1 ,yi) 

w 1.583 + 0.5(3 ■ 1.5 - 2 ■ 1.583) + ^(0.5) 2 • (3 - 6 ■ 1.5 + 4 • 1.583) 

+ ^(0.5) 3 (12 • 1.5 — 8 • 1.583 — 6) 

« 2.2777777777777 
Xi = Xq + h = 1.5 + 0.5 = 2.0 

Therefore, we have y{ 2) = 2.2777777777777. 


3d: For Taylor’s Method of degree 3, we will need the second and third derivatives of y. The only thing we have 
to work with is the o.d.e. itself (after it’s been solved for ^ = 2cos 2 (x) sin(x) — sec(ir) — ytan(x). By 
implicit differentiation, 


d 2 y 

dx 2 


— tan(x) 


dy 

dx 


sec(x) tan (a;) — 4cos(x) sin 2 (x) 


ysec 2 (x) + 2cos 3 (x). 


However, this does not give us y" in terms of x and y. We must substitute ^ in terms of x and y. But that’s 
exactly what the o.d.e. tells us! Substituting = 2cos 2 (x) sin(x) — sec(x) — y tan(x) into the expression for 

t; 2 

(and simplifying a lot!) yields 


d 2 y 

dx 2 


— y + 8 cos 3 (x) — 6 cos(x) 


i2 

Implicitly differentiating the equation for aJ gives 


^ — 24 cos 2 (x) sin(x) + 6sin(x) 
ax 6 dx 

= y tan(x) + (6 — 26 cos 2 (x)) sin(x) + sec(x) 


Now we are ready. Symbolically, Taylor’s Method of degree 3 is 

y %+ 1 = Vi + h ■ y'(xi,yi) + ^h 2 ■ y"(xi,yi) + ^h 3 ■ y"'(xi,yi) 
x i+ 1 = Xi + h 


Beginning with the initial conditions, xq = 1, yo = 0, 

2/i = 2/o + h ■ 2/'(x 0 , y 0 ) + ^ h 2 ■ y"{x 0 , yo) + ^h 3 ■ y"'(x 0 , y 0 ) 
= 0 + 0.5(2 cos 2 (l) sin(l) — sec(l)) 

+ ^(0.5) 2 • (8cos 3 (1) — 6cos(l)) 

+ -(0.5) 3 • (sec(l) + (6 — 26 cos 2 (1)) sin(l)) 

6 

« -0.91657489783846 
Xi = xq + h = 1 + 0.5 = 1.5 


Now xo and yo can be forgotten as we compute X2 and y^- 

2/2 = Vi + h • y'{x\,y{) + ^h 2 ■ y"(xi,yi) + ^h 3 ■ y"'(x 1 ,y 1 ) 

w -0.9166 + 0. 5/(1. 5, -0.9166) + ^(0.5) 2 • y"(1.5, -0.9166) 

+ i(0.5) 3 - 2 / , "(1.5, -0.9166) 

« -1.3083937870918 
xq = Xq + h = 1.5 + 0.5 = 2.0 

Therefore, we have y( 2) = —1.3083937870918. If this exercise does not convince you that Taylor’s Methods 
of degree higher than 2 are not particularly user-friendly, nothing will! 
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7: Remember to document your code! In fact, the documentation for a function should almost always be written 
before the function itself. Putting down in print exactly what the intended inputs and outputs of the function 
will be should help guide how it is written. From the pseudo-code for Euler’s Method, the inputs are the 
differential equation y = f{t,y)\ initial condition y(to) = yo', numbers to and t\. and the number of steps N. 
A reasonable comment for the beginning of the function would list all of these inputs and the output, plus 
document who wrote it when and for what reason: 


mnnmmmmmmmmmmmmmmmmmmm n 

% Written by Leon Brin 29 January 2012 "/, 
°L Purpose: This function implements Euler’s method where the °L 
°i step size is calculated and held constant . “/, 
7„ INPUT: function f(x,y); interval [a,b] ; y(a); steps n “/, 
V. OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) “/, 


The declaration of the function has to have the five inputs as arguments and the output as a return value. 
Something like function [y,x] = eulerode(f ,a,ya,b,n) should do, where ya of course is the input y{a). 
The rest of the function should follow almost verbatim the pseudo-code. I’ve used x instead of t for the 
independent variable, eulerode.m may be downloaded at the companion website. 


mmmmmmmmmmmmmmmmmmmmmm 

°/o Written by Leon Brin 29 January 2012 “/, 

°/o Purpose: This function implements Euler’s method where the “/, 
•/. step size is calculated and held constant . “/, 

V. INPUT: function f(x,y); interval [a,b] ; y(a); steps n “/, 

°/ 0 OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) "/, 

mmmmmmmmmmmmmmmmmmmmmm 


function [y,x] = eulerode(f ,a,ya,b,n) 

i = l; 

x(i) = a; 

y(i) = ya; 
h = (b-a)/n; 
while (i<=n) 


y (i+1) = y(i) + h*f (x(i) ,y(i) ) ; 
x(i+l) = a + (b-a)*i/n; 
i = i+1; 
end%while 
end%f unction 


14c: The equation of motion is s = g(sina — geos a). It is a second order differential equation with dependent 
variable s and independent variable t. The g , a, and g appearing in the equation are constants. We let it = s 
so it = s = < 7 (sina — geos a), and the first order system becomes 

it = g ( sin a — /i cos a) 
s = it 

14f: The equation of motion is s = Fap ^ d (cos fi + ^tsin/3) — gg. It is a second order differential equation with 
dependent variable s and independent variable t. The /3, to, Emptied , and g appearing in the equation are 
constants. We let it = s so u = s = Fap ^ zed (cos /3 + g sin /3) — gg, and the first order system becomes 

u = ^ apphed ( cos /3 + g sin 0) — gg 
m 

s = it 


14m: The equation of motion is s + —s + g = 0. It is a second order differential equation with dependent variable 
s and independent variable t. The c, to, and g appearing in the equation are constants. We let it = s so 
u = s = ——s — g, and the first order system becomes 

c 

u = it — q 

TO 

S = It 


302 


Solutions to Selected Exercises 


15c: The system we are solving is 


ii = g( sin a — /x cos a) 
s = u 

with initial conditions s(0) = 0, s(0) = 0 and parameter values g = 32.2 ft/s 2 , /i = .21, a = .25 rad. No 
conversion of units is needed. We plug the parameter values into the system to get the initial value problem 

it = 1.41462169238826 
s = u 
uq = s(0) = 0 

s 0 = s(0) = 0 

Applying Euler’s method to this system means iterating 

u n + 1 = u n + hii{u n , s n ) = u n + 0.25(1.41462169238826) 

Sn+i — s n T hs(u n , s n ) — s n T 0.25u„ 
t n +i = t n + h 

In particular, 

Mi = mo + 0.25m(mo, so) 

= 0 + 0.25(1.41462169238826) « 0.353655423097065 
si = so + 0.25 mo 

= 0 + 0.25(0) = 0 
U = t 0 + 0.25 = .25 

and 

M 2 = Mi + 0.25 m(mi, Si) 

« 0.3536 + 0.25(1.414) « 0.7073108461941298 
S 2 = si + 0.25 mi 

« 0 + 0.25(0.3536) « 0.08841385577426622 
U = t 0 + 0.25 = .5 

Therefore, s(0.5) « 0.08841385577426622. 

15f: The system we are solving is 

m = ^ appUed ( cos /3 -|- ^ sin /3) — ng 
m 

s = u 


with initial conditions s(0) = 0, s(0) = .03 and parameter values g = 9.81 m/s 2 , ^ = .15, (3 = rad, m = 35 
kg, and F app i ie d = 75 N. No conversion of units is needed. We plug the parameter values into the system to 
get the initial value problem 


it = ^ (cos +.15 sin - .15(9.81) « 0.6658051402529905 

S = M 

Mo = s(0) = .03 
So = s(0) = 0 


Applying Euler’s method to this system means iterating 

u n +i = u n T* hit(u n , s n ) = M n T 0.25(0.6658051402529905) 
Sn+l — Sn hs(u n , Sn') — S n 0.25M n 
tra+1 = t n + h 
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In particular, 


u\ = Mo + 0.25m(mo, So) 

= .03 + 0.25(0.6658051402529905) « 0.1964512850632476 
Si = So + 0.25uo 

= 0 + 0.25(.03) = 0.0075 
t-[ = t 0 + 0.25 = .25 

and 

U2 = u \ + 0.25m(mi, Si) 

« 0.1964 + 0.25(0.6658) « 0.3629025701264953 
S 2 = si + 0.25tti 

« 0.0075 + 0.25(0.1964) « 0.05661282126581191 
t\ — to + 0.25 = .5 

Therefore, s(0.5) « 0.05661282126581191. 

15m: The system we are solving is 


c 

u = u — q 

m 

s = u 


with initial conditions s(0) = 2000, s(0) = —55 and parameter values g = 9.81 m/s 2 , c = 26, and m = 70 
kg. No conversion of units is needed. We plug the parameter values into the system to get the initial value 
problem 


Applying Euler’s method to this 


u = 

— m - 9.81 = 
70 

-^m-9.81 

s = 

M 


Uq = 

s(0) = -55 


50 = 

s(0) = 2000 


system means iterating 



^n+1 

^n+1 

^n+1 


( 13 

u n + hii(u n , s„) = u n + 0.25 ( u n - 9.81 

Sn T hs(u n , s n ) — s n 0.25 u n 

t n + h 


In particular, 


and 


u 1 


si 


t i 


u 0 + 0.25u(u 0 , s 0 ) 

(-55) - 9.81 

so + 0.25ito 

2000 + 0.25(— 55) = 1986.25 
to + 0.25 = .25 


-55 + 0.25 - 


13 

35 


-52.34535714285715 


U2 


s 2 


ti 


iti + 0.25 m(mi, Si) 


-52.34 + 0.25 
Si + 0.25iii 


(-l<- 52 - 34 >- 9 - 81 ) 


-49.9372168367347 


1986.25 + 0.25(-52. 34) « 1973.163660714286 
t 0 + 0.25 = .5 


Therefore, s(0.5) « 1973.163660714286. 
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18a: A number of differential equations solution techniques require you to have some idea what the solution will 
be before you know exactly what it is. You then take this “rough guess” and refine it by forcing it to solve the 
given differential equation. The method of undetermined coefficients is an example of such a technique. We 
know the solution will be a linear combination of certain functions, but we don’t know the right coefficients 
to use. To find the coefficients, we plug the solution with unknown (undetermined) coefficients into the 
differential equation and match the coefficients of like terms. This process leaves us with a linear system of 
equations to solve for the unknowns. In this particular example, we are given that y{x) = Ax 2 + Bx + C is a 
solution of y" + 5 y' — 8 y = 3a; 2 , and it is our job to figure out the values of A, B , and C . We will find y' and 
y" and substitute them into the o.d.e.: 

y' (x) = 2 Ax + B 

y"{ x) = 2 A 

Therefore 

y" + 5 y - 8y = 2 A + 5(2 Ar + B) - 8(Ar 2 + Bx + C). 

Thus, if we are to have a solution of the o.d.e., we will need 

2 A + 5(2 Ax + B) — 8 (Ax 2 + Bx + C) = 3x 2 


Simplifying, that is 


-8Ar 2 + (10A - 8B)x + (2 A + 5 B- 8 C) = 3a: 2 . 


Matching the coefficients of like terms on the left and the right, we have 


-8A 
10A - 8 B 
2A + 5B- 8 C 

The solution of this system is A = — |, .B = — 1|, C = 

_3 2 _ 15 99 _ 

8 a 32 ^ 256 ’ 

18h: A number of differential equations solution techniques require you to have some idea what the solution will 
be before you know exactly what it is. You then take this “rough guess” and refine it by forcing it to solve the 
given differential equation. The method of undetermined coefficients is an example of such a technique. We 
know the solution will be a linear combination of certain functions, but we don’t know the right coefficients to 
use. To find the coefficients, we plug the solution with unknown (undetermined) coefficients into the differential 
equation and match the coefficients of like terms. This process leaves us with a linear system of equations to 
solve for the unknowns. In this particular example, we are given that 9(t) = At cos t+Bt sin t + C cos t+D sin t 
is a solution oi 9 + -^9 + 9 = tcost, and it is our job to figure out the values of A, B , C, and D. We will find 
9 and 9 and substitute them into the o.d.e.: 


= 3 
= 0 
= 0 . 

— AA. Hence the solution of the o.d.e. is y(x) = 


9(t) = [D + A) cos(t) + (B — C) sin(t) + Btcos(t) — At sin(t) 

6(t) = (2 B — C) cos (t) + (-D — 2 A) sin(i) — Atcos(t) — Btsin(t) 

Therefore 

(2 B — C) cos(f) + (— D — 2 A) sin(t) — Atcos(t) — Btsin(t) 

+ ^ ((D + A) cos (t) + (B — C) sin(t) + Btcos(t) — At sin(t)) 
+ At cos t + Bt sin t + C cos t + D sin t 

Simplifying, that is 

9 +—8 + 9 = D + 2B + — A^j cos(f) 

+ {B — C — 2 A) sin(f) 

+ -^Bt cos (t) 

At sin(i) 

10 v ; 
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Thus, if we are to have a solution of the o.d.e., we will need 


(' Io D + 2B + Io A ) cos(i) 

+ {B-C-2A) sin (t) 

1 

+ — Bt cos (t) 

— -^-Atsin(t) = tcost 


Matching the coefficients of like terms on the left and the right, we have 


—D + 2 B+ —A 
10 10 

B-C-2A 


0 

0 

1 

0 


The solution of this system is A = 0, B = 10, C = 10, D = —200. Hence the solution of the o.d.e. is 
9{t) = lOt sin t — 200 sin t + 10 cos t. 


Section 6.3 

la: Each o.d.e. solver has the form 

2 /i+i = yi + h (weighted average of evaluations of /). 

It is the integration formula that gives us the weighted average. In this case, the formula 

j (/Oo) +3/ (x 0 + 

tells us to average f(x o), the value of / at the first node, with f(x o + |/i) in a 1 : 3 ratio. That is, we sum one 
f(x o) with three f(x o + | h) and divide by 4. Unfortunately, we are using / here in two different settings. The 
/ in an o.d.e. solver is not the same / used in deriving the integration formulas. The / from the integration 
formulas is a function of one variable, x. The / we need in an o.d.e. solver is a function of two variables, t 
and y. Nevertheless, they play the same role. They each hold the values of the function we are integrating. If 
we need to sum one f(x o) with three f(x o + | h) in the integration formula, then we need to sum one f(U, yi) 
with three /(ti+ 2 / 3 ; yi+ 2 / 3 ) hi the o.d.e. solver. Generally, f(x 0 + ah) in an integration formula translates to 
f{ti+on Ui+a ) i n the o.d.e. solver as long as the integration formula is written for an interval of length h. 

Each o.d.e. solver begins with k\ = f(U,yi) where (ti,yi) is the last point approximated. Each successive 
value in the o.d.e. solver is obtained by using Euler’s method with initial condition (starting point) (tj,j/j). 
For this particular integration formula, there is only one node other than Xq, so we will need only one more 
stage. We approximate y i+ 2/3 by yi + k\ (Euler’s method using starting point (ti , 3 /*) and approximate 
slope ki). This makes &2 = f(U + ^,Vi + The final step is to compute the weighted average. As 

discussed, we need to sum one k\ with three k -2 and divide by 4. In summary, the o.d.e. solver suggested by 
this integration formula is 


ki 

k 2 

Vi+i 


f{U,Vi) 

( 2 h 2 h, 

J I U + + -jj-Kl 


Vi + ^ [h + 3 k 2 ] ■ 


le: Each o.d.e. solver has the form 


2 /i+i = y x + /i(weighted average of evaluations of /). 


306 


Solutions to Selected Exercises 


It is the integration formula that gives us the weighted average. In this case, the formula 

2 ^3/ ^r 0 + ^hj + f(x o + h) j 

tells us to average f(x o + |ft), the value of / at the first node, with f(x o + h) in a 3 : 1 ratio. That is, we 
sum three f(x o + |/i) with one f(x o + h) and divide by 4 . Unfortunately, we are using / here in two different 
settings. The / in an o.d.e. solver is not the same / used in deriving the integration formulas. The / from 
the integration formulas is a function of one variable, x. The / we need in an o.d.e. solver is a function of 
two variables, t and y. Nevertheless, they play the same role. They each hold the values of the function we 
are integrating. If we need to sum three f(x o + ^h) with one f(x o + h) in the integration formula, then we 
need to sum three /(ti+i/3, 2/14-1/3) with one Ui+i) in the o.d.e. solver. Generally, f(x 0 + ah) in an 

integration formula translates to f(ti+ a , Vi+a) in the o.d.e. solver as long as the integration formula is written 
for an interval of length h. 

Each o.d.e. solver begins with k-\ = /(£,. where ( ti,yi ) is the last point approximated. Each successive 
value in the o.d.e. solver is obtained by using Euler’s method with initial condition (starting point) (' £,,?/, ). 
For this particular integration formula, there are two nodes other than Xq, so we will need two more stages. 
We approximate z/i+i/3 by Vi + (Euler’s method using starting point (fj,yj) and approximate slope hi). 
This makes k 2 = /(fj + 3 , y, + |/ci). We then approximate y i+1 by y, : + hk 2 (Euler’s method using starting 
point ( ti,yi ) and approximate slope k 2 ). The final step is to compute the weighted average. As discussed, we 
need to sum three k 2 with one k% and divide by 4 . In summary, the o.d.e. solver suggested by this integration 
formula is 


fci 

k 2 


f{U,Vi) 

,( h h, 

I [U+ -,yi+ -k 1 


= f{U + h,yi + hk 2 ) 
Di + 1 = Vi+J [ 3*2 + * 3 ] • 


2 a: We will modify the test code from the text in two essential ways. 

1 . It will be adapted for the o.d.e. solver 

*1 
*2 

Vi+i 

2 . An extra loop will be added so it approximates y{ 2 ) for a number of step sizes. 
These modifications will make it a simple matter to determine the rate of convergence. 


= f(U,yi ) 

, ( 2 h 2 h , 

— J I U + — , yt + ~^~ki 


Vi+ j [ki + 3 k 2 \ 


t 0 = 4 ; 
h=-l/ 4 ; 
n=8 ; 

f=inline("-y/t+t~2") ; 
exact=inline("t~ 3 / 4 + 16 /t") ; 


y0=20 ; 

disp(’ h y Error’) 

disp(’ ’) 


for j = 1 : 6 
t=tO ; 
y=yO; 
for i=l:n 
kl=f (t,y) ; 
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k2=f (t+2*h/3 ,y+2*h/3*kl) ; 
y=y+h/4* (kl+3*k2) ; 
t=t+h; 
end%for 
x=exact (t) ; 

sprintf ( ’7.12 . 5g"/.12. 5g"/.12. 5g’ ,h,y ,abs(y-x) ) 
n=n*2; 
h=h/2; 
end7,f or 

The output from this code is 



h 

y 

Error 

ans = 

-0.25 

9.9391 

0.060922 

ans = 

-0.125 

9 . 9846 

0.015433 

ans = 

-0.0625 

9.9961 

0.0038827 

ans = 

-0.03125 

9.999 

0.00097364 

ans = 

-0.015625 

9.9998 

0.00024378 

ans = 

-0.0078125 

9.9999 

6 . 099e-05 


The ratio of the step size on one line to the next is and the ratio of consecutive errors is about ^ = (A) 2 , 
so it appears the o.d.e. solver has rate of convergence 0(h 2 ). The integration method has rate of convergence 
0{h 4 ) so we would expect the o.d.e. solver to be 0{h 3 ). Our experiment does not show the expected rate of 
convergence. 

2e: An extra loop will be added so it approximates y(2) for a number of step sizes. 

These modifications will make it a simple matter to determine the rate of convergence. 

t0=4; 
h=-l/4 ; 
n=8 ; 

f=inline("-y/t+t~2") ; 
exact=inline("t~3/4+16/t") ; 
y0=20 ; 


disp(’ h y Error’) 

disp(’ ’) 


for j = 1 : 6 
t=tO ; 
y=yO; 
for i=l:n 
kl=f (t,y) ; 

k2=f (t+h/3,y+h/3*kl) ; 
k3=f (t+h,y+h*k2) ; 
y=y+h/4* (3*k2+k3) ; 
t=t+h; 
end7 0 for 
x=exact (t) ; 

sprintf ( ’7,12 . 5g 0 / 0 12. 5g7.12. 5g’ ,h,y , abs (y-x) ) 
n=n*2; 
h=h/2; 
end7«f or 

The output from this code is 

h y Error 


ans 


-0.25 


9.9697 


0.03027 
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ans = 

-0.125 

9.9923 

0.0076889 

ans = 

-0.0625 

9.9981 

0.0019376 

ans = 

-0.03125 

9.9995 

0.00048634 

ans = 

-0.015625 

9.9999 

0.00012183 

ans = 

-0.0078125 

10 

3 . 0487e-05 


The ratio of the step size on one line to the next is and the ratio of consecutive errors is about | = (|) , 
so it appears the o.d.e. solver has rate of convergence 0(h 2 ). The integration method has rate of convergence 
0(h 4 ) so we would expect the o.d.e. solver to be 0(h 3 ). Our experiment does not show the expected rate of 
convergence. 

8a: The Octave function we wrote to implement Euler’s method takes 5 arguments. As explained in the comment 
preceding the function declaration, 


7„ INPUT: function f(x,y); interval [a,b] ; y(a) ; steps n °/ 0 

l OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) °/ 0 




function [y,x] = eulerode(f ,a,ya,b,n) 


they are, in order, (f ) the function /(x , y) appearing on the right side of the o.d.e., (a) the ^-coordinate of the 
initial condition, (ya) the y-coordinate of the initial condition, (b) the ^’-coordinate of the desired solution, 
and (n) the number of steps that should be taken. From the Octave command line, the solution can be found 
this way: 

>> f ormat ( ’ long’ ) 

» f =inline( ’ 3*x-2*y ’ ) 
f = f(x, y) = 3*x-2*y 
>> eulerode(f , 1 , 1 ,2,20) 
ans = 


Columns 1 through 4: 

1 . 00000000000000 1 . 05000000000000 

1 . 10250000000000 

1 . 15725000000000 

Columns 5 through 8: 

1 . 21402500000000 1 . 27262250000000 

1.33286025000000 

1.39457422500000 

Columns 9 through 12: 

1 . 45761680250000 1 . 52185512225000 

1.58716961002500 

1.65345264902250 

Columns 13 through 16: 

1 . 72060738412025 1 . 78854664570823 

1.85719198113740 

1.92647278302366 

Columns 17 through 20: 

1 . 99632550472130 2 . 06669295424917 

2.13752365882425 

2.20877129294183 

Column 21: 

2.28039416364764 




The value in Column 21 is the desired result, so y( 2) ss 2.28039416364764. The rest of the output gives approxima- 
tions for the solution at other points. For example, y (1.95) ss 2.20877129294183. Use [y , x] =eulerode (f , 1 , 1 , 2 , 20) 
to see all the corresponding x-coordinates. 

8d: The Octave function we wrote to implement Euler’s method takes 5 arguments. As explained in the comment 
preceding the function declaration, 


°/ 0 INPUT: function f(x,y); interval [a,b] ; y(a) ; steps n °/ 0 

1 OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) °/ 0 


mnmmmmmmnmmmmmnmmmmmmmm 


function [y,x] = eulerode(f ,a,ya,b,n) 


they are, in order, (f ) the function /(x, y) appearing on the right side of the o.d.e., (a) the x-coordinate of the 
initial condition, (ya) the y-coordinate of the initial condition, (b) the x-coordinate of the desired solution, 
and (n) the number of steps that should be taken. From the Octave command line, the solution can be found 
this way: 
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>> f ormat ( ’ long 1 ) 

>> f =inline ( ’ (2*cos (x)~3-l-y*sin(x) )/cos (x) 1 ) 
f = f(x, y) = (2*cos(x)~3-l-y*sin(x))/cos(x) 
>> eulerode(f , 1 ,0,2,20) 


ans = 

Columns 1 through 3: 
0.000000000000000 
Columns 4 through 6: 

-0.210091730766547 
Columns 7 through 9: 

-0.471098428249811 
Columns 10 through 1 
-0.762393924730387 


-0.063348127711403 

-0.292335849279218 

-0.566012332190405 
' . 

-0.861836463006993 


-0.133556806761731 

-0.379594108676440 

-0.663433947473280 

-0.960521838453174 


Columns 13 through 15: 


-1.055901027787366 -1.150767311038156 -1.243138035592362 


Columns 16 through 18: 

-1.331810188637979 -1.415726818259857 -1.493905125626401 

Columns 19 through 21: 

-1.565422860316011 -1.629418404020635 -1.685095172485204 

The value in Column 21 is the desired result, so y( 2) ss —1.685095172485. The rest of the output gives approxima- 
tions for the solution at other points. For example, y( 1.95) ~ —1.629418404020. Use [y,x] =eulerode(f , 1 ,0,2,20) 
to see all the corresponding ^-coordinates. 

9a: The Octave functions we wrote to implement other methods take 5 arguments. Here, we imagine a similar 
function for trapezoidal-ode has been written and looks like 


7„ INPUT: function f(x,y); interval [a,b] ; y(a); steps n °/ 0 

V. OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) °/ 0 




function [y,x] = trapode(f ,a,ya,b,n) 


The arguments are, in order, (f) the function f(x,y) appearing on the right side of the o.d.e., (a) the x- 
coordinate of the initial condition, (ya) the y-coordinate of the initial condition, (b) the ^-coordinate of the 
desired solution, and (n) the number of steps that should be taken. From the Octave command line, the 
solution can be found this way: 


>> f ormat (’ long’ ) 

» f =inline ( ’ 3*x-2*y ’ ) 
f = f(x, y) = 3*x-2*y 
>> trapode (f , 1 , 1 , 2 , 20) 
ans = 

Columns 1 through 4: 

1 . 00000000000000 1 . 05125000000000 

Columns 5 through 8: 

1 . 21770048765625 1 . 27676894132891 

Columns 9 through 12: 

1 . 46249381424058 1 . 52680690188772 

Columns 13 through 16: 

1 . 72546107002583 1 . 79329226837337 

Columns 17 through 20: 

2 . 00061943296957 2 . 07081058683746 

Column 21 : 

2.28395561437552 


1 . 10475625000000 
1.33735089190266 
1.59213524620839 
1.86180450287790 
2.14145858108790 


1 . 16030440625000 
1.39930255717191 
1.65838239781859 
1.93093307510450 
2.21252001588455 


The value in Column 21 is the desired result, so y( 2) « 2.28395561437552. The rest of the output gives approxima- 
tions for the solution at other points. For example, z/(l .95) ss 2.21252001588455. Use [y ,x]=trapode(f , 1 , 1 ,2,20) 
to see all the corresponding ^-coordinates. 
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9d: The Octave functions we wrote to implement other methods take 5 arguments. Here, we imagine a similar 
function for trapezoidal-ode has been written and looks like 


7„ INPUT: function f(x,y); interval [a,b] ; y(a) ; steps n °/ 0 

l OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) °/ 0 




function [y,x] = trapode(f ,a,ya,b,n) 


The arguments are, in order, (f) the function /(x,y) appearing on the right side of the o.d.e., (a) the x- 
coordinate of the initial condition, (ya) the y-coordinate of the initial condition, (b) the ^-coordinate of the 
desired solution, and (n) the number of steps that should be taken. From the Octave command line, the 
solution can be found this way: 


>> f ormat ( ’ long’ ) 

>> f=inline( ’ (2*cos(x)~ 
f = f(x, y) = (2*cos(x) 
>> trapode(f , 1 ,0,2,20) 
ans = 

Columns 1 through 3: 

0.000000000000000 - 
Columns 4 through 6: 

-0.218610595984683 - 

Columns 7 through 9: 

-0.482031924143591 - 

Columns 10 through 12: 

-0.768792826665983 - 

Columns 13 through 15: 

-1.056576584732967 - 

Columns 16 through 18: 

-1.328187356783625 - 

Columns 19 through 21 : 

-1.547099820528092 - 


3-l-y*sin(x) )/cos(x) ’ ) 
~3-l-y*sin(x) ) / cos (x) 


0.066778403380866 
0.302399307505556 
0.576216643912361 
0.865212265743696 
1 . 151350240932434 
1.408239476567505 
1.604277373646634 


-0.139846898631295 

-0.390473688925680 

-0.672121275727591 

-0.959857757799220 

-1.242238115924874 

-1.481492346014993 

-1.652308958787397 


The value in Column 21 is the desired result, so y( 2) « —1.652308958787. The rest of the output gives approxima- 
tions for the solution at other points. For example, y(1.95) ~ —1.604277373646. Use [y,x]=trapode(f , 1 ,0,2,20) 
to see all the corresponding x-coordinates. 

10a: The Octave functions we wrote to implement other methods take 5 arguments. Here, we imagine a similar 
function for clopen-ode has been written and looks like 


°/o INPUT: function f(x,y); interval [a,b] ; y(a) ; steps n °/ 0 

V. OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) °/ 0 


mmmnmmmmmmHmmmmmmmmmnmm 


function [y,x] = clopen(f ,a,ya,b,n) 


The arguments are, in order, (f) the function /(x,y) appearing on the right side of the o.d.e., (a) the x- 
coordinate of the initial condition, (ya) the y-coordinate of the initial condition, (b) the x-coordinate of the 
desired solution, and (n) the number of steps that should be taken. From the Octave command line, the 
solution can be found this way: 

>> f ormat (’ long 1 ) 

» f =inline( ’ 3*x-2*y ’ ) 
f = f(x, y) = 3*x-2*y 
>> clopen(f , 1 , 1 ,2 , 20) 
ans = 

Columns 1 through 4: 

1.00000000000000 1.05120833333333 1.10468084027778 1.16020204697801 

Columns 5 through 8: 

1.21757698550727 1.27662924238649 1.33719919281938 1.39914240296940 
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Columns 9 through 
1.46232818428681 
Columns 13 through 
1.72529447404116 
Columns 17 through 
2.00047048394069 
Column 21: 

2.28383076622349 


1.52663828541552 

16: 

1.79312894992824 

20 : 

2.07066737621900 


1.59196570858681 

1.86164534486007 

2.14132136424883 


1.65821363865296 

1.93077876287422 

2.21238894775115 


The value in Column 21 is the desired result, so y( 2) ss 2.28383076622349. The rest of the output gives approxima- 
tions for the solution at other points. For example, y(1.95) ss 2.21238894775115. Use [y ,x] =clopen(f , 1 , 1 , 2 , 20) 
to see all the corresponding ^-coordinates. 

lOd: The Octave functions we wrote to implement other methods take 5 arguments. Here, we imagine a similar 
function for clopen-ode has been written and looks like 


7. INPUT: function f(x,y); interval [a,b] ; y(a); steps n 7, 

7. OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) 7« 


mmmmmmmmmmmmmmmmmmmmmm 


function [y,x] = clopen(f ,a,ya,b,n) 


The arguments are, in order, (f) the function f(x,y) appearing on the right side of the o.d.e., (a) the x- 
coordinate of the initial condition, (ya) the y-coordinate of the initial condition, (b) the x-coordinate of the 
desired solution, and (n) the number of steps that should be taken. From the Octave command line, the 
solution can be found this way: 


>> f ormat ( ’ long 1 ) 

>> f =inline ( ’ (2*cos (x)~3-l-y*sin(x) )/cos (x) ’ ) 
f = f(x, y) = (2*cos(x)~3-l-y*sin(x))/cos(x) 
>> clopen(f , 1 ,0,2,20) 
ans = 


Columns 1 through 3: 
0.000000000000000 
Columns 4 through 6: 

-0.218333343735571 
Columns 7 through 9: 

-0.481626032825074 
Columns 10 through 1 
-0.768574489070735 
Columns 13 through 1 


-0.066674788135152 

-0.302057681694326 

-0.575822930559361 

i • 

-0.865241984556076 


-0.139650010793905 

-0.390087513340042 

-0.671782830658695 

-0.960839121780159 


-1.051332254162207 -1.136768664871208 -1.218181121459446 


Columns 16 through 18: 


-1.294632701999881 -1.365219285669536 -1.429077386836689 

Columns 19 through 21: 

-1.485393339498179 -1.533411938658838 -1.572444496803329 


The value in Column 21 is the desired result, so y( 2) « —1.572444496803329. The rest of the output gives approxima- 
tions for the solution at other points. For example, y(l. 95) ~ —1.533411938658838. Use [y,x]=clopen(f ,1,0,2,20) 
to see all the corresponding ^-coordinates. 

11a: The Octave function we wrote to implement the midpoint method takes 5 arguments. As explained in the 
comment preceding the function declaration, 


7. INPUT: function f(x,y); interval [a,b] ; y(a); steps n 7, 

7o OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) 7» 


7 7 7. 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 

/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o 


function [y,x] = midpoint (f , a, ya,b,n) 
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they are, in order, (f ) the function fix , y) appearing on the right side of the o.d.e., (a) the ^-coordinate of the 
initial condition, (ya) the y-coordinate of the initial condition, (b) the x-coordinate of the desired solution, 
and (n) the number of steps that should be taken. From the Octave command line, the solution can be found 
this way: 


>> f ormat ( ’ long 1 ) 

» f =inline( 1 3*x-2*y 1 ) 
f = f(x, y) = 3*x-2*y 
>> midpoint (f , 1 , 1 ,2,20) 
ans = 

Columns 1 through 4: 

1 . 00000000000000 1 . 05125000000000 

Columns 5 through 8: 

1 . 21770048765625 1 . 27676894132891 

Columns 9 through 12: 

1 . 46249381424058 1 . 52680690188772 

Columns 13 through 16: 

1 . 72546 107002582 1 . 79329226837337 

Columns 17 through 20: 

2 . 00061943296957 2 . 07081058683746 

Column 21: 

2.28395561437552 


1 . 10475625000000 
1.33735089190266 
1.59213524620839 
1.86180450287790 
2.14145858108790 


1 . 16030440625000 
1.39930255717191 
1.65838239781859 
1.93093307510450 
2.21252001588455 


The value in Column 21 is the desired result, so y(2) ~ 2.28395561437552. The rest of the output gives approxima- 
tions for the solution at other points. For example, y(l. 95) sa 2.21252001588455. Use [y ,x] =midpoint (f , 1 , 1 , 2 , 20) 
to see all the corresponding x-coordinates. 

lid: The Octave function we wrote to implement the midpoint method takes 5 arguments. As explained in the 
comment preceding the function declaration, 


7. INPUT: function f(x,y); interval [a,b] ; y(a) ; steps n 7, 

7. OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) °/ 0 




function [y,x] = midpoint (f , a, ya,b,n) 


they are, in order, (f ) the function f(x, y) appearing on the right side of the o.d.e., (a) the x-coordinate of the 
initial condition, (ya) the y-coordinate of the initial condition, (b) the x-coordinate of the desired solution, 
and (n) the number of steps that should be taken. From the Octave command line, the solution can be found 
this way: 


>> format ( ’long’ ) 

>> f =inline( ’ (2*cos (x) ~3-l-y*sin(x) )/cos(x) ’ ) 
f = f(x, y) = (2*cos (x) ~3-l-y*sin(x) ) /cos (x) 

>> midpoint (f , 1 ,0,2,20) 
ans = 

Columns 1 through 3: 

0.000000000000000 -0.066766774094073 -0.139831999606821 


Columns 4 through 6: 

-0.218600428030388 -0.302401486830318 -0.390495486389841 

Columns 7 through 9: 

-0.482080439082276 -0.576299298636036 -0.672247230148908 

Columns 10 through 12: 

-0.768977840728485 -0.865503930033315 -0.960754716787988 

Columns 13 through 15: 


-1.057757600117324 -1.154510687305015 -1.247336119828964 

Columns 16 through 18: 

-1.335197000042218 -1.417135309027307 -1.492245593754752 

Columns 19 through 21 : 

-1.559677244661507 -1.618640905170988 -1.668415622421331 
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The value in Column 21 is the desired result, so y( 2) « —1.668415622421. The rest of the output gives approxima- 
tions for the solution at other points. For example, y(1.95) « —1.618640905170. Use [y,x] =midpoint (f , 1 ,0,2,20) 
to see all the corresponding ^-coordinates. 

12a: The Octave function we wrote to implement Ralston’s method takes 5 arguments. As explained in the 
comment preceding the function declaration, 


7. INPUT: function f(x,y); interval [a,b] ; y(a); steps n 7, 

7 0 OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) 7« 




function [y,x] = ralston(f ,a,ya,b,n) 


they are, in order, (f ) the function /(x, y) appearing on the right side of the o.d.e., (a) the x-coordinate of the 
initial condition, (ya) the y-coordinate of the initial condition, (b) the x-coordinate of the desired solution, 
and (n) the number of steps that should be taken. From the Octave command line, the solution can be found 
this way: 


>> f ormat ( ’ long’ ) 

» f =inline ( ’ 3*x-2*y ’ ) 
f = f(x, y) = 3*x-2*y 
>> raslston(f , 1 , 1 , 2 , 20) 
ans = 

Columns 1 through 4: 

1.00000000000000 1.05125000000000 1.10475625000000 

Columns 5 through 8: 

1.21770048765625 1.27676894132891 1.33735089190266 

Columns 9 through 12: 

1.46249381424058 1.52680690188772 1.59213524620839 

Columns 13 through 16: 

1 . 72546107002583 1 . 79329226837337 1 . 86 180450287790 

Columns 17 through 20: 

2.00061943296957 2.07081058683746 2.14145858108790 

Column 21 : 

2.28395561437552 


1 . 16030440625000 
1.39930255717191 
1.65838239781859 
1.93093307510450 
2.21252001588455 


The value in Column 21 is the desired result, so y( 2) ss 2.28395561437552. The rest of the output gives approxima- 
tions for the solution at other points. For example, y(1.95) ss 2.21252001588455. Use [y ,x] =ralston(f , 1 , 1 , 2 , 20) 
to see all the corresponding x-coordinates. 

12d: The Octave function we wrote to implement Ralston’s method takes 5 arguments. As explained in the 
comment preceding the function declaration, 


7. INPUT: function f(x,y); interval [a,b] ; y(a); steps n 7« 

7, OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) / 

7.7.7o7.7.7.7o7.7.7.7.7.7.7.7.7o7.7.7o7o7o7.7o7.7o7.7.7.7.7.7.7.7.7.7.7.7.7o7o7.7o77.7.7.7.7.7.7.7.7.7.7.7.7.7o7.7.7.7.7o7.7o7.7.7. 

function [y,x] = ralston(f ,a,ya,b,n) 


they are, in order, (f ) the function /(x, y) appearing on the right side of the o.d.e., (a) the x-coordinate of the 
initial condition, (ya) the y-coordinate of the initial condition, (b) the x-coordinate of the desired solution, 
and (n) the number of steps that should be taken. From the Octave command line, the solution can be found 
this way: 


>> f ormat (’ long’ ) 

>> f =inline ( ’ (2*cos (x)~3-l-y*sin(x) )/cos (x) ’ ) 
f = f(x, y) = (2*cos(x)~3-l-y*sin(x))/cos(x) 

>> ralston(f , 1 , 0 ,2 , 20) 
ans = 

Columns 1 through 3: 

0.000000000000000 -0.066770300283373 -0.139836235303672 
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Columns 4 through 6: 

-0 . 218602682516778 -0 . 302399209595394 

Columns 7 through 9: 

-0 . 482061961403970 -0 . 576269242338981 

Columns 10 through 12: 

-0 . 768915255605024 -0 . 865412688887274 

Columns 13 through 15: 

-1 . 056164950925061 -1 . 150218643526626 

Columns 16 through 18: 


-0.390486264578185 

-0.672202937226713 

-0.960565629810385 

-1.240368616917767 


-1.325575733886901 -1.404886704290576 -1.477402196316258 

Columns 19 through 21: 

-1.542278061151791 -1.598731451393269 -1.646047861531770 

The value in Column 21 is the desired result, so y( 2) « —1.6460478615317. The rest of the output gives approxima- 
tions for the solution at other points. For example, z/(l .95) « —1.598731451393. Use [y ,x] =ralston(f , 1 , 0 , 2 , 20) 
to see all the corresponding cc-coordinates. 


Section 6.4 

la: The o.d.e. solver previously derived is 


ki 

k 2 

Vi+ 1 


f{U,Vi) 

( 2 h 2 h, 

\ IT ’ ll 1 


Ui + j [k\ + 3 k 2 ] 


making /3 2 = §, ay = |, and a 2 = §■ Plugging these values (plus (3 3 = a 3 = 0) into equations 6.4.4, 


3 

4 


M +0 


0 - 0 - 



7^ 


1 

1 

2 

1 

3 

1 

6 


Since the only unsatisfied equation was derived from h 3 terms, we conclude that this method has local 
truncation error 0(h 3 ). The integration formula from which it was derived has local truncation error 0(h 4 ), 
so it is not quite as accurate as an o.d.e. solver. However, local truncation error 0(h 3 ) is consistent with the 
experimentally determined 0{h 2 ) rate of convergence. In fact, it is this local truncation error that leads to 
the 0{h 2 ) rate of convergence. 


le: The o.d.e. solver previously derived is 


ki 

k 2 


f(U,yi ) 

, ( h h, 

/ \ 3 ’ ^ 3 ^ 


k 3 = f(ti + h,yi + hk 2 ) 
Vi+l = Vi + ^ [3*2 + * 3 ] • 
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making /3 2 = §, P 3 = 1, at± = 0, cn 2 = §, and Plugging these values into equations 6.4.4, 


3 

4 




-•l 2 

4 


1 1 
4 ' 3 


1 




1 

1 

2 

1 

3 

1 

6 ' 


Since the only unsatisfied equation was derived from h 3 terms, we conclude that this method has local 
truncation error 0(h 3 ). The integration formula from which it was derived has local truncation error 0(/i 4 ), 
so it is not quite as accurate as an o.d.e. solver. However, local truncation error 0(h 3 ) is consistent with the 
experimentally determined 0[h 2 ) rate of convergence. In fact, it is this local truncation error that leads to 
the 0[h 2 ) rate of convergence. 


2 : From the initial value problem, f(t,y) = ty and y( 1) = |. For the o.d.e. solver, this means to = 1 and yo = \. 
To compute y{ 2) in one step, h = 1 and 


h 

k 2 

k 3 

2/i 


1 1 


f(ti,Vi) = 1 '\ = \ 
f(U + ^h, y, + ^hk r) 

f(ti + ^h, y t + ^ hk 2 ) 
f(ti + h, yt + hk 3 ) = 



2/o + ~^h{k\ + 2fc 2 + 2fc 3 + ^4) 


35 


2- 9 


„ 51 67 

2 • 1 

32 16 


16 


= 2.1875 


to + h = 1 + 1 = 2 


9 

8 

51 

32 


Thus y( 2) ss 2.1875. Euler’s method with two steps yielded y( 2) « 1.3125. Since the exact solution is 
y{ 2) = ss 2.240844535169032, RK4 did a much better job in one step than did Euler’s method in two 
steps. Incidentally, even four steps of Euler’s method (which means 4 function evaluations — just as many as 
one step of RK4) , yields y{ 2) « 1.621398925781250. 


Section 6.5 

4 : The blanks in the table are to be read as zeros, so /3n = /3 i 2 = 0, for example. The only non-zero value for the 
/ 3ij is P 21 = 1. The values in the left column are the Si, so <5 2 = 1. The values in the bottom row are the on, 
so cti = q 2 = *. In summary, 

S 2 = 1 , P 2 \ = 1 , ai = a 2 = -■ 

Because the tableau has two rows above the row of a*, it is a two-stage method. Therefore, the method takes 
the form 


k i = 

k 2 = 


2/i+i 


f(U, 2/i) 

f(U + S 2 h, y t + fcihki) 
Vi + h[a 3 ki + a 2 k 2 \. 
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See equation 6.5.2. Plugging in the parameter values, this tableau represents the method 


fci = 

k 2 = f(U + h 1 y i + hki) 


Ui+i = Vi + h 


\ kl + \ k 2 


This last equation simplifies to Vi+i = Hi + j[ki + k 2 ]- These equations are exactly those in equation 6.3.3, 
trapezoidal-ode, or the improved Euler method. 

6b: First, decoding the table into the form 6.5.2, we see this is a 4-stage method with formula 

h = f(ti, yi) 

/ 2 2 

k 2 = / ( U + -h, yi + -hki 

( 4 8 4 

k 3 = f \U + -h, yi - — hki + g hk 2 

, , ( 6 , 29 , , 2 5 , , 

fc 4 = J IU + -h, yi + —hkx- -hk 2 + -hk 3 


Ui+i = Vi + h 


i, 1, 5 , 1, 

6 kl + 6 k2+ n k3 + 4 l 


Code similar to the samples in sections 6.3 and 6.4 might look like thirdOrder ,m, which may be downloaded 
at the companion website. 


7, Written by Leon Brin 9 June 2016 7, 
7, Purpose: This function implements a 3rd order Runge-Kutta 7« 
7. method where the step size is calculated and held 7o 
7. constant . 7o 
7. INPUT: function f(x,y); interval [a,b] ; y(a) ; steps n °L 
7. OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) 7« 

mmmmmmmmmmmmmmmmmmmmmm 


function [y,x] = thirdOrder(f ,a,ya,b,n) 

i = l; 

x(i) = a; 

y(i) = ya; 

h = (b-a)/n; 

while (i<=n) 

kl = f (x(i) , y (i) ) ; 

k2 = f (x(i)+2*h/7, y(i)+2*h/7*kl) ; 

k3 = f (x(i)+4*h/7, y(i)+h/35*(-8*kl+28*k2)) ; 

k4 = f (x(i)+6*h/7, y(i)+h/42*(29*kl-28*k2+35*k3)) ; 

y (i+1) = y (i) + h/12* (2*kl+2*k2+5*k3+3*k4) ; 

x(i+l) = a + (b-a)*i/n; 


i = i+1; 
end7«while 
end7«f unction 


Applying this code to the test o.d.e. used in section 6.3, 


y 


2 /( 4 ) 



20 , 


to approximate y{ 2), which we know has exact value 10, with various step sizes yields 
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>> f ormat ( ’ long 1 ) 

>> f =inline ( 1 -y/t+t~2 ’ ) 

f = f(t, y) = -y/t+t~2 

>> [y ,x] =thirdOrder (f ,4 , 20 ,2 , 5) ; 

>> abs (10-y (length(y) ) ) 

ans = 4. 14600417808941e-04 

>> [y ,x] =thirdQrder (f ,4,20,2,10) ; 

>> abs (10-y (length(y) ) ) 

ans = 5 . 20403883292886e-05 

>> [y , x] =thirdOrder (f ,4,20,2, 20) ; 

>> abs (10-y (length(y) ) ) 

ans = 6 . 48395888624975e-06 

>> [y ,x] =thirdOrder (f ,4 , 20,2,40) ; 

>> abs (10-y (length(y) ) ) 

ans = 8 . 08029787080500e-07 


Since the number of steps is doubling from one call of thirdOrder to the next, the step size is halving. As 
the step size is halved, the error is decreasing by a factor of 8, or by (^) 3 , lending numerical evidence that 
the rate of convergence is 0(h 3 ). 

10 : First, decoding the table into the form 6.5.2, we see the embedded methods have 5 and 4 stages with formulas 

ki = 

k 2 = 
k 3 = 

ki 

' = 

{first method} yi+i = 

{second method} yi+\ = 

The difference of the two methods will be used as an error estimate: 


error ss h 

r 1 , 2, i 

-ki + —ki + -k 5 

— h 

7, 5, 1, 

-k\ - -k 2 - -k 3 + 2 ki 


6 3 6 


9 3 9 


— — — [ — 11 k\ T 30^2 T 2k 3 — 24^4 T 3 A:^] . 


Since we are told this is an RK3(4) method, it has rate of convergence (order) 3 and therefore has local 
truncation error 0(/i 4 ). This means the error will scale with the fourth power of h. This is important when 
adjusting the step size. We will need to use a fourth root, not a third root as in RK2(3). Besides this change 
and the formula changes, the code of RK2(3) can be shared. rk34butcher ,m may be downloaded at the 
companion website. 


/ (ti + i/i, yi + ^hk\ 

( 3 9 

/ ( U + - h, yi - - hki + 3 hk 2 

f (ti + hh, yi + ^ hkx + j^h.k 2 + 


7 5 1 

/ ( U + h, yi + -hk i - -hk 2 - -hk 3 + 2hki 


Vi + h 
Vi + h 


1 , 2 , 1 , 
-kx + -ki + -k 5 
6 3 6 


7, 5, 1, 

-ki - -k 2 - -k 3 + 2fc 4 
y o y 


1 Written by Leon Brin 9 June 2016 7, 
7 . Purpose: This function implements an adaptive rk3(4) method of 7 « 
7 . Butcher where the step size is controlled by the routine. 7 « 
7 . INPUT: function f(x,y); interval [a,b] ; y(a); initial step 7 « 
7 » size h; tolerance eps; maximum steps N; °L 
’/ OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) 7 « 

mnmmmmmmmmmmmmmmm mmmmmm 
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function [y,t] = rk34butcher (f ,a,ya,b ,h,eps ,N) 
i = 1; 

t (i) = a; 
y(i) = ya; 
done = 0; 

while ( ! done kk i<=N) 

if ((b-t(i)-h)*(b-a)<=0) 
h=b-t(i) ; 
done = 1 ; 
endif 

kl = f (t (i) , y(i)) ; 

k2 = f (t(i)+h/4, y(i)+h/4*kl) ; 

k3 = f (t(i)+3*h/4, y(i)+h/4*(-9*kl+12*k2)) ; 

k4 = f(t(i)+h/2, y(i)+h/36*(2*kl+15*k2+k3) ) ; 

k5 = f (t(i)+h, y (i)+h/9* (7*kl-15*k2-k3+18*k4) ) ; 

err = abs(h/18*(-ll*kl+30*k2+2*k3-24*k4+3*k5)) ; 

if (done I I err<=eps) 

y(i+l) = y(i) + h/6*(kl+4*k4+k5) ; 
t(i+l) = t(i) + h; 
if (t(i+l) == t(i) ) 

disp( "Procedure failed. Step size reached zero.") 
return 
endif 
i = i+1; 
endif 

q = 0 . 9*realpow(eps/err , 1/4) ; 
q = max(q, 0 . 1) ; 
q = min(5.0,q) ; 
h = q*h; 
end%while 
if (Idone) 

disp( "Procedure failed. Maximum number of iterations reached.") 
endif 

end%f unction 

12b: The method of exercise 6c shares the first three stages with this method. All we need to do is append the 
line of at values from that table to this one, noting that we need to add a zero at the end: 


2 14 

I ?! i 

¥ f I a 

9 3 9 

15a: There are two difficulties with this problem. The more straightforward of the two is knowing what the error 
of the approximation really is. This o.d.e. is not solvable exactly, so we can’t compute the exact solution. We 
can certainly run the method with a tolerance of ICO 4 , but this is only a local truncation error. It does not 
necessarily translate into any estimate of the global error (the total accumulated error at the last step). Often 
times, they will be similar in magnitude, but there is far from any guarantee of it. In any case, here are the 
results of running the method with initial step size A and tolerance ICO 4 : 

>> f =inline( ’ (x+2*exp(y) *cos (exp(x) ) ) / (l+exp(y) ) ’ ) 
f = f(x, y) = (x+2*exp(y)*cos(exp(x)))/(l+exp(y)) 

» [y,x]=rk23(f ,0,2,4, 1/10, le-4, 100000) ; 

» y (length(y) ) 

ans = 2.37564101044550 
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Figure 6.5.1: log-log plot of tolerance versus global error 

RK2(3) 



tolerance 


>> length (y) 
ans = 152 

suggesting that y( 4) ss 2.37564. Though we should have some confidence that this is a reasonable estimate 
(say with error no more than 10 -2 ), we should certainly not claim that the error is less than, or really all that 
close to 10 -4 . The algorithm took 152 steps to arrive at the result, so the error had a chance to accumulate. 
If it is extremely important to know that the estimate is accurate to the nearest 10 -4 or better, it could be 
compared to a second run with a smaller tolerance: 

» [y,x]=rk23(f ,0,2, 4, 1/10, le-5, 100000) ; 

» y (length (y) ) 

ans = 2.37616344347848 

The difference between the estimates is about 5.22(10) -4 . This would suggest that the error in the first 
estimate is likely a bit more than 10 -4 . But even this evidence is far from iron-clad. The second difficulty 
is that small adjustments in the tolerance can lead to large changes in the global error. Global error as a 
function of tolerance is very rough and discontinuous (see Figure 6.5.1). The oscillatory nature of the solution 
exacerbates this problem with adaptive Runge-Kutta methods. If the global error scaled perfectly with the 
truncation error, Figure 6.5.1 would show a perfectly straight line parallel to the line y = x, shown in red. 
This figure shows that most tolerances between 10 -5 and 10~ 3 would suffice to give a global error of 10 -4 or 
less, though there are some exceptions, most notably one right around 10~ 4 . Figure 6.5.2 shows the solution 
over the interval [0,4], illustrating its oscillations. Generally speaking, comparing multiple approximations 
using different tolerances is not how global error is controlled. Global error can be reasonably well controlled 
by scaling the tolerance relative to the step size as the solution progresses or using relative errors instead of 
absolute. Either way, this concern adds another layer of complexity to the method. 

16a: There are two difficulties with this problem. The more straightforward of the two is knowing what the error 
of the approximation really is. This o.d.e. is not solvable exactly, so we can’t compute the exact solution. We 
can certainly run the method with a tolerance of ICO 4 , but this is only a local truncation error. It does not 
necessarily translate into any estimate of the global error (the total accumulated error at the last step). Often 


320 


Solutions to Selected Exercises 


Figure 6.5.2: Solution of equation 6.5.4 



x 


times, they will be similar in magnitude, but there is far from any guarantee of it. In any case, here are the 
results of running the method with initial step size ^ and tolerance ICG 4 : 

>> f=inline( 1 (x~2+y) /(x-y~2) 1 ) 
f = f (x , y) = (x~2+y)/(x-y~2) 

» [y,x]=rk23(f ,0,5,3, 1/10, le-4, 100000) ; 

» y (length(y) ) 

ans = 3.66765768487404 

>> length (y) 

ans = 17 

suggesting that y( 4) « 3.66765. Though we should have some confidence that this is a reasonable estimate 
(say with error no more than 10~ 2 ), we should certainly not claim that the error is less than, or really all 
that close to ICG 4 . The algorithm took 17 steps to arrive at the result, so the error had a small chance to 
accumulate. If it is extremely important to know that the estimate is accurate to the nearest ICG 4 or better, 
it could be compared to a second run with a smaller tolerance: 

» [y,x]=rk23(f ,0,5,3, 1/10, le-5, 100000) ; 

» y (length(y) ) 

ans = 3.66757804370410 

The difference between the estimates is about 7.96(10) -5 . This would suggest that the error in the first 
estimate is likely right around ICG 4 . But even this evidence is far from iron-clad. The second difficulty 
is that small adjustments in the tolerance can lead to large changes in the global error. Global error as a 
function of tolerance is rough and discontinuous (see Figure 6.5.3). If the global error scaled perfectly with 
the truncation error, Figure 6.5.3 would show a perfectly straight line parallel to the line y = x, shown in red. 
This figure shows that most tolerances between ICG 5 and ICG 3 would suffice to give a global error of ICG 4 or 
less, though there may be some exceptions not plotted. Figure 6.5.4 shows the solution over the interval [0, 3]. 
Generally speaking, comparing multiple approximations using different tolerances is not how global error is 
controlled. Global error can be reasonably well controlled by scaling the tolerance relative to the step size 
as the solution progresses or using relative errors instead of absolute. Either way, this concern adds another 
layer of complexity to the method. 
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Figure 6.5.3: log-log plot of tolerance versus global error 

RK2(3) 



tolerance 


Figure 6.5.4: Solution of equation 6.5.5 
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Answers to Selected Exercises 


Section 1.1 


10e: 0.83333 


14a: .2353263818643 and .2343263818643 


15a: .2349438537911 and .2347090273506 


16: (p,p) G {(f ^ 


K 3 ’ 300 / ’ 


(1 103 \ 
l 3 ’ 300 / ’ 


__9L) 

V 3 J 300 / ’ 


(_1 — 103 \ 1 

V 3 5 3nn / I 


21: p = dzl and p is anything; or p = p ^ 0. 


24a: (i) 8.99999974990351 (ii) 2.5009649(10)" 7 (hi) 2.7788499(10)" 8 (iv) (10)" 14 (v) 2.5009647(10)" 7 


Section 1.2 

If: T 3 (x) = x 2 . R 3 (x) = 4. 

9d: 10.760 

12a: £(tt) = cos" 1 ( 1 ^ M ) « 0.7625. 


Section 1.3 

Id: a = 1 
6f: 0( i) 

6h: o (*) 

6n: O(s) 

19e: 4 iterations 


Section 1.4 

7: (a) 1 more than 4 times the number required for the 2 n ~ 1 x 2” _1 grid, (b) 0 (c) 0 
14: (a) S(n — 1, k — 1) (b) k ■ S(n — 1, k) 
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Answers to Selected Exercises 


Section 2.1 


4c: In 27 iterations, we get 0.666666664928, which is within 10 8 of an actual root. 
4f: In 27 iterations, we get 21.9911485687, which is within 10 -8 of an actual root. 
5: (a) 0.625 (b) 1.09375 

37tt 
2 

33 

One possible collatz ,m file is 


10 : 

16: 

23: 


7 . Written by on 7 , 

7 . Purpose: implementation of the collatz function 7 « 
7. INPUT: integer n 7 , 

7 . OUTPUT: n/2 or 3n+l depending on whether n is 7 « 
7, even or odd 7. 

“/ “/ “/ “/ “/ “/ “/ “/ 7. “/ “/ “/ “/ “/ 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 

/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o 

function res=collatz(n) 
if (ceil(n/2)==n/2) 
res=n/2 
else 

res=3*n+l 

end7«if 

end7«f unction 


25: (a) v / 20^ : 


Section 2.2 

2d: (i) The hypotheses of the MVT are met. (ii) c « —2.540793513382845. 

2g: (i) The hypotheses of the MVT are not met. 

2h: (i) The hypotheses of the MVT are met. (ii) c « 17.41987374102208. 

3c: —2 and 5 
3d: —1 and — | 

4c: fi(x) = \J 4 ~2 X2 an d fz(x) = \J 4-8x5 . There are many others. 

4f: fi{x) = --^+i)(iog 3 .u-j -i anc j j 2 (x) = \J (log 2 3)(a; 2 — 5x + 1) — 5x — 1. There are many others. 

5c: 1.326008542399018, 1.598751095046933, 1.737721941251104, 1.779703798972744, 1.788512049183622; the se- 
quence seems to be converging. 

6c: 1.79047660196506 

7c: The web diagram over [.8,2] is: 
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18: (a) 15 (b) The equations g(x) = x and /(x) = x are equivalent. 

23: 

Section 2.3 

10: (a) 15. HINT: It is valid to bound the derivative over the interval [1.618033988749895, 2.5] instead of the entire 
interval [.5,3.5]. Why? On the other hand, if you do consider the whole interval [.5,3.5], you get a bound of 
43. (b) It actually takes 15 iterations. 

13: ai « 1.942415717 and a 2 « 1.623271404 

14: 2.732050809. HINT: use /( x) = </2x 3 + 4x 2 4x 1. Why? 

15: ao = 3, ai = |, and a 2 = | 

18: No. Aitken’s delta-squared method is designed to speed up linearly convergent sequences, not superlinearly 
convergent sequences. 

21: ai « 2.152904629 and a 2 « 1.873464044 

23: | or 0 

24: x « 5.259185715 

Section 2.4 

4c: Using Xo = 2 and x\ = 3, we find x 8 = 1.47883214766643. 

4d: Using Xq = 3 and X\ = 4, we find xio = 0.948434069243393. 

5c: Using Xo = 2.5, we find Xq = 1.47883214733021. 

5d: Using x 0 = 3.5, we find Xr = 0.948434069919634. 

6c: Using Xq = 2.5, we find aq 8 = 0.948434068437721. 

6d: Using xo = 3.5, we find *i 5 = 0.948434069313413. 

7c: Using Xq = 2 and x\ = 3, we find x\q = 1.47883214733021. The difference between X\q and is about 
3.3(10) -1 °, so x 8 was indeed accurate to within 10 -5 . 

7d: Using Xo = 3 and Xi = 4, we find xi 2 = 0.948434069919636. The difference between xi 2 and xio is about 
6.7(10) _1 °, so xio was indeed accurate to within 10 -5 . 

9b: Xu = 0.580055888962675. 

15b: X 14 = 0.580055888962675. This is different from 0. Why? 
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16: x 10 = 3.739599358563032. 

20: x 16 = 3. 7201766622615984(10) “ 4 , x 17 = 2.4933434933779863(10)- 4 , and x 18 = 1.6752024023472534(10)- 4 . 
aie = 3.7404947721983783(10) -6 so Aitken’s delta-squared method DOES speed up convergence. 

23: (a) n (b) Newton’s method will fail because </(0) = 0. (c) 6 (d) Something near —7.5 will do. 

25c: In 18 iterations, we get 0.666666668082383, which is within 10~ 8 of an actual root. This is the same root 
found by the bisection method, but the bisection method took longer, 27 iterations. 

25f: In 9 iterations, we get 21.9911485743912, which is within 10 -8 of an actual root. This is the same root found 
by the bisection method, but the bisection method took longer, 27 iterations. 

31: 3.555963292212723 

Section 2.5 

2: / and (a), g and (d), h and (b), l and (c) 

8: / and (b), g and (c), h and (d), l and (a) 

Section 2.6 

6b: g( 2) = 5 and g'( 2) = —8 
8b: x\ = ^ and x 2 = 

14b: -8, -2.33333, 0.33333, 2 + i, 2 - i 

15b: -2, 0.76393, 5.23607, 0.66667 + 0.577351, 0.66667- 0.57735* 

16b: They do change, but not within the first five decimal places. 

19b: (i) -109.372462336481 (ii) -109.372462336481 (iii) ans = 0 
19c: (i) 948.990683139955 (ii) 948.990683139955 (iii) ans = 1 
20 : 


nnmmmmmmmmmmmmmmmm 

7. Written by Dr. Len Brin 15 January 2014 7 0 

7. Purpose: Implementation of Newton’s Method 7» 

7. for polynomials of the form 7« 

7. p(x) = cl + c2*x + c3*x~2 + . . . + c(n+l)*x~n 7« 

7. using Horner’s Method, n > 1. / 

7. INPUT: coefficients c; tolerance tol; maximum 7« 
7o number of iterations N 7» 

7. OUTPUT: approximations to all roots, roots 7« 

nnmmmmmmmmmmmmmmmm 


function roots = newthornall(c,tol,N,xO) 
n=length(c) -1 ; 
for i=l:n-2 

res=newt onhorner (c,xO,tol,N) 
roots(i)=res; 
xO=roots (i) ; 
c=deflate(c,xO) ; 
end7«f or 

[roots(n-l) ,roots(n)]=quadraticRoots(c(3) ,c(2) ,c(l)) ; 
end7«f unction 


Remark: This code is often successful, but can easily come up empty. For example 
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newthornall ( [56 , -152 , 140 ,-17,-48,9], le-5 , 100 , 2) 


returns 

res = 0.763932022500484 

res = 5.23606797749979 

res = Method failed maximum number of iterations reached 

error: newthornall: A (I) = X: X must have the same size as I 
error: called from: 

error: ... /newthornall ,m at line 16, column 13 

It fails to come up with the third real root, —2. After finding the first two roots, the polynomial has 
been deflated to 

14.00000000000065 - 16.99999999999987a;+ 

6.00000000000002a; 2 + 9.00000000000000a; 3 . 

With this cubic and initial value 5.23606797749979, Newton’s method does not converge to —2. On the 
other hand, newthornall ( [56 , -152 , 140 , -17 , -48 , 9] , le-5, 100,-2) returns 

res = -2 

res = 0.763932022500211 

res = 5.23606797749979 

ans = 

Columns 1 and 2: 

-2.000000000000000 + 0.0000000000000001 0.763932022500211 + 0. 0000000000000001 

Columns 3 and 4: 

5.236067977499789 + 0 . 0000000000000001 0.666666666666667 + 0. 577350269189623i 

Column 5: 

0.666666666666667 - 0 . 577350269189623i 
Having found —2 first, it has no problem finding the other roots. 


21: (a) 


(b) 


1.5858 

-13 

4.4142 

-2 + 2.2361* 
-2 - 2.2361* 

3 - 1.4142* 
- 2.6 

-2 + 2.2361* 
-2 - 2.2361* 
3 + 1.4142* 


Section 2.7 

1: (a) x 4 = 2.1806 (e) x w = -502.19 (j) x 3 = 1.0079 
2: (a) x 5 = 2.1798 (e) x a = — 

3: (a) x 7 = 2.1798 (e) x§ = — 


502.19 (j) x 6 = 1.0079 
499.98 (j) x 5 = 1.0080 
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4 : (a) x 7 = 2.1798 (e) x 2 = -499.98 (j) x 3 = 4.1495 

5 : (a) x 9 = 2.17975713685875 (e) x 18 = -502.188059117229 (j) x 4 = 1.00794427892360 
6: (a) x 6 = 2.17975706647997 (e) x 10 = -502.188059235320 (j) a; 8 = 1.00794427848101 
7 : (a) x 6 = 2.17975706647996 (e) x 9 = -502.188059235320 (j) x 4 = 1.00794427848094 

8: (a), (e), and (j): Bracketed inverse quadratic interpolation is at least as fast or faster than false position or 
bracketed Newton’s method. 

9: bracketedSteffensens.m may be downloaded at the companion website. 


nnmmmmmmmmnmmmmmmm 

7„ Written by Dr. Len Brin 15 January 2014 °/ 0 

1 Purpose: Implementation of Steffensen’s method °/ 0 
1 INPUT: function f; initial value xO; tolerance °/« 
l TOL; maximum iterations NO °/ 0 

V. OUTPUT: approximation x and number of °/ 0 

% iterations i; or message of failure °/ 0 

mmmmmmmmmmnnnmmmmm 


function [x,i] = bracketedStef f ensens (f , a, b, TOL, NO) 

i=l; 

A=f (a) ; 

B=f (b) ; 
while (i<=N0) 


b 


xO=b ; 
xl=B ; 
x2=f (xl) ; 

if (abs (x2-xl) <T0L) 
x=x2; 

disp(" "); 
return 
end'/oif 

x=xO- (xl-xO) ~2/ (x2-2*xl+x0) ; 
if (x<min( [a,b] ) II x>max( [a,b] ) ) 
x=a+(b-a)/2; 
end'/oif 

if (abs (x-x2) <T0L) 
disp(" "); 
return 
end'/oif 
X=f (x) ; 

if ((B<b && X>x) I I (B>b && X<x)) 
a=b ; A=B ; 
end'/oif 
b=x ; B=X; 
i=i+l ; 
end%while 

x="Method failed maximum number of iterations reached"; 

end%f unction 


10 : (a) x 6 = 2.17975706643814 (e) x 41 = -502.188059386686 (j) x 9 = 1.00794427672537 

11: (a), (e), and (j): Bracketed inverse quadratic interpolation is at least as fast or faster than bracketed Steffensen’s 
method, counting only number of iterations. However, bracketed Steffensen’s requires two function evaluations 
per iteration, so for all practical purposes requires more than twice the computational power of bracketed 
inverse quadratic interpolation. 
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13: (a) Xq = 2.17975706647996 (e) a; 8 = -502.188059227438 (j) x 4 = 1.00794427848094 

14: (a) and (j): Since the root is on the order of 1, there is no difference between testing absolute and relative 
errors, (e) Since the root is around five hundred, the method stops when the absolute error is only about 
10 -6 • 500 = 5(10) -4 . Consequently, the method stops one iteration earlier when checking relative error than 
it does checking absolute error. 

Section 3.2 

15: 4 

16: 3, 18±^42 J or 1S-VI42 

21: 8 

Section 3.3 

5: P 2 ( x) = -0.001642458785316a; 2 + 1.64927376355948a; + 10 
8: P 3 ( x) = 2a; — 1. Is degree 1 what you expected? 

14: (a) ( b ) 8.7364(10)- 5 max |/< 4 >(a;)| (c) .52501 

uuuu ae[8.1,8.7] 1 1 

15: 0.5a; 3 + 1.5a; 2 + 0.335a; + 0.951 

19: / and (b), g and (c), ft and (d), l and (a) 

Section 4.1 

2cc: f (x 0 + |) « n*°+ h l~f(*o) 

3cc: /' (x 0 + §) « ^ x o+h)-f(x 0 ) 

6: (a) 20.32712878304436 (e) 0.6321205268681437 (g) 0.2325441461772505 

7: (a) (i) e 3 - e" 4 (ii) 0.2599074987454273 (e) (i) 1 - e" 1 (ii) 3.196041398201288(10)- 8 
(g) (i) In 2 (ii) 1. 2110575916990385(10) “ 5 

8b: 1.19336533331362 

9b: .19336533331362 

lib: 35 
Ilf: — 

- L - L1 ' 15 

12a: -1 
12c: -23 

13b: f'(x 0 + 3 h) » 7/Co+2/d-i5/Co)+8/(tt 0 -fr) 

13f: f(xo - ft) » -/(»o+2h)+9/(xo)-8/(xo->i) 

13h- f'(xo) ~ -f( x °+ 3h )+ 9 f( x o+ h ’)~ s f( x o) 

14c: f(x)dx « - | • ((20 2 - 9 1 - 0o)f(x o + 9 3 h ) - (20 3 - 0i - 9 0 )f(x 0 + 0 2 h)) 

15a: fx° +2h f( x ) dx ~ I [/(so) + 3/ (*o + f ft)] 

15e: C 0 +h f( x ) dx ~ I [/(*o) + f(xo + ft)] 
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Section 4.2 


1: 


<b) /' (*„ + » /(^» + >») - /(*■) 

m f / 1 M _ -3/(x 0 - /i) + 4/(x 0 ) - /(x 0 + h ) 

(f) / (xo -h)fa — 

, — 7/(x 0 — h) + 16/(x 0 + 2h) — 9/(x 0 + 3ft.) 

(h) / (x 0 - h) w ^ 

, -3/(x 0 - ft) - 10/(x 0 ) + 18/(x 0 + ft) - 6/(x 0 + 2/i) + f(x 0 + 3ft-) 

(1) / (®o) « To7 


2: (b) f (lo - 6) * /(»■ - - 2 /g°> + /(*° + ft) 

,, w „, M /(x 0 - *) - 4/(x 0 + 2h) + 3/(x 0 + 3h) 

(d) / (®o - h) ~ ^2 

,, w „, \ 11 /( a; o-*)-20/(xo) + 6/(xo + /i)+4/(xo + 2/i)-/(xo + 3/i) 

(h) / (*o) « 


4: 


rx 0 + h 

(d) / /(z)dx 


hf{x 0 ) 


(f) 

(h) 

(j) 


(*xo+2/i 


x 0 


(*a;o+/i 


/»xo+2/i 


f(x)dx 
f(x)dx f 
f{x)dx : 


h (j ^0 + ^ 

\ if{xo) + f(x 0 + h)) 

^ ^3/ ^x 0 + ^hj + f(x o + 2h)^j 


Section 4.3 

2: /'(— 2.7) « -0.9151775; /'(- 2.5) « 1.5014075; /'(- 2.3) « 2.17825; /'(-2.1) « 1.11535 
3c: 0.4897985468241977 
3e: 149/24 = 6.2083 

4: (c) 0.4693956404725931 (e) 17/2 = 8.5 
5: (c) 0.5 (e) 81/16 = 5.0625 
6: (c) 8. 57775220962087(10) “ 5 (e) 0.0083 
7: (c) 0.02031712882950837 (e) 2.3 
8: (c) 0.0102872306978985 (e) 1.1375 
10 : 0 

lib: 288666.8155482048 

12b: lower: 1565.147456974753 upper: 2334.925631788689 actual: 1915.502415038936 
13b: 3.142092629759007 

17a: error term: 0(h 2 f'(£)) degree of precision: 0 
17e: error term: 0(/i 4 /"'(£)) degree of precision: 2 
17g: error term: 0(/i 4 / 7,, (£)) degree of precision: 2 
17i: error term: 0(h 5 /^(£)) degree of precision: 3 
18a: O(hf"(0) 

18e: 0(fc 4 /< 5 >(£)) 
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18g: O(hf>"(0) 

18i: O(h 3 p\0) 

23a: 0.0134fc for some constant k depending on the approximation formula, not the function sinac 

25: (a) 0{h 3 ) (b) 1 (c) r « 2.720699046351327 (d) « 0.8612854633416616 (e) actual absolute error: 

0.7206990463513265 

27: -- 

28: 0(h 2 ) 

30: 0 

31: 10506.03569166666 

36: approximation 1: ~3(i-22i40)+4(i.i05i7)-i = 1 _ 2 176; approximation 2: i-34986-i.i05i7 = 1.22345; approximation 
3: ~ 3 ( 1 - 2214Q )+ 4 ( 1-34986)- 1.49182 _ i 2171; rank: 3,1,2; Other answers are acceptable. 

38: 2.58629507364657; h = .0000474853515625 


Section 4.4 

1: (c) 17.52961733248352 (e) 1.560867019857898 

2: (c) 19.3773960369059 (e) 1.569045013890161 

3: (c) 18.14554356729098 (e) 1.563593017868653 

4: (c) 18.14441877898906 (e) 1.563592239944993 

5: (c) 17.73342635968343 (e) 1.561774648629937 

8: 141 


lib: 


f'XQ-\-2h 


f(x)dx « — 
on 


n_1 / h\ n ( h 

f(x o) + /( X 0 + 2h) + 2^2 f (x 0 + 2 i- J + 4^/ ( x 0 + (2i - 1) - 

i = l ' * i = l ' 


lie: 


XQ-\-3h 


f(x)da 


3 h 
8 n 


n-\ , > \ 

f( x o) + f i x o + 3/i) + 2 22 f ( — j 

i — 1 ' ' 

+3 *22 (^f (^o + (3* — 2) — ^ + (3i — 1) — ^ ^ 


16: 0.386259562814567 

19: (a) 1.386294361119891 (b) 132 

21: 3.109198655184147; yes 


26: 0.3862939349171364; 5 

27: A straightforward implementation, adaptSimpO: 
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#################################################### 

# Written by Leon Brin 15 May 2014 # 

# Purpose: Implementation of adaptive Simpson’s # 

# rule # 

# INPUT: function f, interval endpoints a and b, # 

# desired accuracy TOL. # 

# OUTPUT: approximate integral of f(x) from a to b # 

# within TOL of actual. # 

#################################################### 

function res = adaptSimp(f ,a,b,T0L) 
h = (b-a) / 4 ; 
fO = f (a) ; 
fl = f (a+h) ; 
f 2 = f (a+2*h) ; 
f 3 = f (a+3*h) ; 
f4 = f (b) ; 

error = abs(h*(f0-4*(fl+f3)+6*f2+f4))/45; 
if (error <= TOL) 

res = h/3*(f0+4*(fl+f3)+2*f2+f4) ; 
else 

res = adaptSimp(f ,a,a+2*h,T0L/2) + adaptSimp(f ,a+2*h,b,T0L/2) ; 
endif 

endf unction 

A pair of functions that minimizes the number of evaluations of /, aSimpO and adaptiveSimpsons () : 

#################################################### 

# Written by Leon Brin 15 May 2014 # 

# Purpose: Wrapper for aSimpO # 

# INPUT: function f, interval endpoints a and b, # 

# desired accuracy TOL. # 

# OUTPUT: approximate integral of f(x) from a to b # 

# within TOL of actual. # 

#################################################### 
function res = adaptiveSimpsons (f ,a,b,T0L) 

res = aSimp(f , a,b,f (a) ,f ( (a+b)/2) , f (b) ,T0L) ; 
end#function 

#################################################### 

# Written by Leon Brin 15 May 2014 # 

# Purpose: Implementation of adaptive Simpson’s # 

# rule # 

# INPUT: function f, interval endpoints a and b, # 

# f0=f(a), f 2=f ( (a+b) /2) , f4=f (b) , desired # 

# accuracy TOL. # 

# OUTPUT: approximate integral of f(x) from a to b # 

# within TOL of actual. # 

#################################################### 
function res = aSimp(f ,a,b,f0,f2,f4,T0L) 

h = (b-a) / 4 ; 
fl = f (a+h) ; 
f 3 = f (a+3*h) ; 

error = abs(h*(f0-4*(fl+f3)+6*f2+f4))/45; 
if (error <= TOL) 

res = h/3*(f0+4*(fl+f3)+2*f2+f4) ; 
else 
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res = aSimp(f ,a,a+2*h,f0,f 1 ,f2,T0L/2) 
end#if 

end#function 


+ aSimp(f , a+2*h,b,f2,f3,f4,T0L/2) ; 


REMARK: aSimpO , adaptSimpO, and adapt iveSimpsons () must be contained in separate .m files, 
adapt iveSimpsons () is the only one that should be used directly. The others are called by it. Code may be 
downloaded at the companion website. 

28: » f =inline ( ’ log(sin(x) ) ’ ) ; 

» adaptiveSimpsons (f , 1 , 3, . 002) 
ans = -0.70293 

30a: (a) (i) 


» f ormat ( ’ long’ ) ; 

» f=inline( ’x*sin(x~2) 1 ) ; 

» adaptiveSimpsons (f , 0 , 2*pi , 10~-5) 
ans = 0.603500307287469 


(ii) 


1— cos (4 
2 


0.603500307287469 


6.175(10) 7 (iii) yes 


Section 4.5 

i 8 sin (^) — sin(7r/i) 

1: 3 h 

3: 0{h 9 ) 


5: (a) N( 1.0) « 0.4596976941318602 and N( 0.5) « 0.489669752438509 

(b) (i) Ai(1.0) « 0.5196418107451577 (ii) TVi(l.O) « 0.4996604385407252 

(c) assumption (i) because it yields an approximation with error about half that of N( 1.0), just what would 
be expected if assumption (i) were true. 

REMARK: lim^ 0 = §. 

N{h) - 12N(h/3) + 27N(h/9) 

9: 16 

10: (c) 18.1436194387278 (e) 1.56359161739838 (g) 3.10928992861842 

11 : The following code works, but is not very efficient and depends on a working compositeTrapezoidalQ function. 
In fact, the inefficiency is due to calling the compositeTrapezoidalQ function. Each time compositeTrape- 
zoidalQ is called, it recalculates values of f that it already calculated last time it was called. Avoiding this 
repetition of work would make the routine much more efficient. Can you think of a way to accomplish this? 
romberg.m may be downloaded at the companion website. 

#################################################### 

# Written by Dr. Len Brin 16 May 2014 # 

# Purpose: Implementation of Romberg integration # 

# INPUT: function f, interval endpoints a and b, # 

# tolerance tol # 

# OUTPUT: approximate integral of f(x) from a to b # 
#################################################### 
function integral = romberg(f ,a,b,tol) 

N(1 , l)=compositeTrapezoidal(f ,a,b, 1) ; 

N(2 , l)=compositeTrapezoidal (f , a,b, 2) ; 

N(2,2)=(4*N(2, l)-N(l,l))/3; 
i=2; 
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while (abs(N(i,i)-N(i,i-l))>tol II abs (N ( i , i) -N(i-1 , i-1) )>tol) 
i=i+l ; 

N(i , l)=compositeTrapezoidal (f,a,b,2~(i-l)) ; 
for j=2:i 
m=4~( j-1) ; 

N(i, j)=(m*N(i, j-l)-N(i-l, j-l))/(m-l) ; 
end#f or 
end#while 
integral=N(i ,i) ; 
end#function 


12a: (i) 


>> romberg( inline ( ’x*sin(x~2) ’ ) , 0 ,2*pi , 10~-5) 
ans = 0.603500924593406 


(ii) 


1 — cos(47t 2 ) 
2 


0.603500924593406 


2.34(10) 10 (iii) yes, and not just barely 


Section 5.2 

9c: 

f —.28 + 3. 1861 (x - .2) - 3.208(x - .2) 2 - 10.693333(x - .2) 3 , x £ [.1, .2] 

S(x) = < .0066 + 2.5465(x - .3) - 3.188(x - .3) 2 + .066667(x - .3) 3 , x £ [.2, .3] 

[ .24 + 2.2277(x - .4) + 10.626667(x - .4) 3 x £ [.3, .4] 

9f: 

r -.28 + 3.84613(x - .2) - 20.0773(x - .2) 2 - 245.387(x - ,2) 3 , x £ [.1, .2] 

S(x) = < .0066 + 2.91347(x - .3) + 10.7507(x - ,3) 2 + 102.76(x - .3) 3 , x £ [.2, .3] 

[ .24 + 0.1(x - .4) - 38.8853(x - ,4) 2 - 165.453(x - .4) 3 , x £ [.3, .4] 

10c: » [a,b , c , d] =naturalCubicSpline ( [ . 1 , . 2 , . 3 , .4] , [- . 62 , - . 28 , . 0066 , . 24] ) 
a = 

-0.2800000 0.0066000 0.2400000 


b = 

3.1861 2.5465 2.2277 

c = 

-3.20800 -3.18800 0.00000 


d = 

-10.693333 0.066667 10.626667 

12c: » [a,b , c , d] =clampedCubicSpline ( [ . 1 , . 2 , . 3 , .4] , [- . 62 , - . 28 , . 0066 , . 24] , 0. 5 , 0 . 1) 
a = 

-0.2800000 0.0066000 0.2400000 

b = 

3.84613 2.91347 0.10000 


c = 

-20.077 10.751 -38.885 

d = 


-245.39 


102.76 -165.45 
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Section 6.1 

la: one 
lc: two 
If: two 

2a: y(t) = e 4 . Substituting into y = y yields e 4 = e 4 , a true statement. 

2c: 

s(f) = ^e" 4/2 ^V3cos - sin 

s(t) = -\e~ t/2 ^\/3cos i^Y^j +sin {^Y^j 

Substituting into s + s + s = 0 yields 

-Y t/2 (^ cos (^) +sin (^)) + Y t/2 (^ cos (Y) - sin (^)) + e_t/2sin (Y ) = °’ 

a true statement. 

2f: r(t) = and r{t) = — Substituting into frt 2 = — | yields ^7^) (^d) ^ = ~ S> a true statement for 

/ > 0 . 

3a: y(t) = 4e 4 . Substituting into y = y yields 4e 4 = 4e 4 , a true statement. Furthermore, y( 0) = 4e° = 4 as required. 

3c: s(t) = — fe _4 “. Substituting into s = (1 — 2 s)t yields — te _4 “ = |l-2x |^1 + e_t ”)) ^ a ^ rue statement. 
Furthermore, s(0) = |(1 + e°) = 1 as required. 

3f: r(t) = and f(f) = — Substituting into frt 2 = — | yields ^77^) (^7t) t 2 = — §, a true statement for 

t > 0. Furthermore, 7'(9) = \/9 — 3 = 0 and r(9) = = \ as required. 

4a: y{x) = x b + C 

4d: y(t) = In \t\ + C, t < 0 

4f: s(f) = 3 (f - l)e 4 + C 

5a: From the graphs of the exact and approximate solutions, it appears the approximation is reasonable, but gets 
progressively worse as t increases. The greatest error occurs at 1, and to be more precise, the relative error 
there is about 0.099, less than 10%. 
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5c: From the graphs of the exact and approximate solutions, it appears the approximation is very good at t = 0 
and t = 2, but is not particularly accurate between. To be more precise, the relative errors at t = 0.5, 1, 1.5 
are about .124, .097, and .095. At three of the five points, the relative error is 9.5% or more. 



5f: From the graphs of the exact and approximate solutions, the approximation looks very good for all values of t. 
The greatest errors seem to occur at t = 11 and t = 13. To get an idea of just how good the approximation 
is, the absolute errors at t = 11 and t = 13 are about .0066 and .0044, respectively. The relative errors are 
about .021 and .0073, respectively. All small errors. 



6a: 



6b: 


337 



6e: F app u ed and Ff riction may be swapped. 


1 N 


F, 


applied 


F, 


friction 


mg 


6g: 



6h: 


t N 


Ffriction 


I mg 


6i: 



6j: 
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61: 


6n: 


60: 



7 : (6a) 0 +f sin# = 0 ; (6b) with dowhnill as the positive direction: s = g(sina — /xcosa); (6e) s= 2 - F app u e d — /xg; 
(6g) with uphill as the positive direction: s = ^ F app i ie( i cos(/l — a) — g(sina + gcosa); (6h) with the direction 
of the sled’s motion as the positive direction: s = —gg] (6i) with downhill as the positive direction: s = 
g(sina — /xcosa); (6j) with the direction of the puck’s motion as the positive direction: s = — /xg; ( 61 ) with 
up as the positive direction: s = ^-s — g; (6n) with up as the positive direction: s = — g; (6o) with up as 

the positive direction: s = — g 

8: Kinetic friction: gmg versus fi(mg + F app i ie( i sin 20°). Necessary applied force to overcome friction: /xmg versus 
cos 20°^fsin 20° • The a PP^ e d force pushing parallel to the floor will need to be only (cos 20° — /x sin 20°) times 
as great as when pushing at 20° from parallel. For example, when /i = .3, cos 20° — /x sin 20° ~ .837 so the 
necessary force pushing parallel to the floor is only 83.7% of that needed pushing at 20° from parallel. 


Section 6.2 

lc: y(2) « 1.3125 
2c: 2/(2) « 1.88671875 






3c: 2/(2) « 2.126953125 
4: ?/(1.5) « 0.8203125 

5: 
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Assumptions: The solution of the o.d.e. exists and is unique on the interval from to to ti. 

Input: Differential equation y = f(t,y ); formula y(t,y ); initial condition y(to) = yo', numbers to and ti; 
number of steps N. 

Step 1: Set t = f 0 ; y = yo ; h = (t i - t 0 )/N 
Step 2: For j = 1 ... N do Steps 3-4: 

Step 3: Set y = y + hf(t, y) + \h 2 y{t , y) 

Step 4: Set t = to + — to) 

Output: Approximation y of the solution at t = t\. 

8: taylor2ode.m may be downloaded at the companion website. 


7„ Written by Leon Brin 13 November 2015 “/, 
*/. Purpose: This function implements Taylor’s method of order 2 “/, 
°/ 0 where the step size is calculated and held constant . "/, 
1 INPUT: function f(x,y); function (df/dx) (x,y) ; interval [a,b] ; "/, 
l y(a) ; steps n "/« 
°/ 0 OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) "/, 

nnmmmmmmmmmmmmmmmmmmmnm 


function [y,x] = taylor2ode(f , ft ,a,ya,b,n) 

i = 1; 

x(i) = a; 
y (i) = ya; 
h = (b-a)/n; 
while (i<=n) 


y (i+1) = y(i) + h*(f (x(i) ,y(i) ) + 0.5*h*ft(x(i) ,y(i))) ; 
x(i+l) = a + (b-a)*i/n; 
i = i+1; 
end%while 
end%f unction 


11 : 


2/(2) « 2.3125, 2.28814697265625, 2.28469446951954, 2.28402793464698 
absolute errors are approximately 

0.02866617919084, 0.004313151847096, 8.606487103870(10) -4 , 1.941138378267(10) -4 
error ratios are approximately 6.6, 5.0, 4.4. 


14a: 


u = — j sin 6 

6 = u 


14b: 

ii = (/(sin a — y, cos a) 
s = u 


14e: 


u 


-F, 


m 


applied 


S 


U 
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14g: 

u = -^F applied cos(j3 - a) - g(sina + /xcosa) 

s = u 

14h: 

u = ~H9 

s = u 

14i: 

u = g( sin a — /xcosa) 
s = u 

141 : 

m = -fig 
s = u 

141: 

c 

u = — u — q 

m 

s = u 

15: (a) -0.6656470478206087 (b) 0.2384138557742662 (e) 0.05695982142857142 (g) 0.2313498206324268 (h) 14.979875 
(i) 5.988821238748838 (j) 43.9939625 (1) 4.387767857142857 

18c: y(x) = fa; - § 

18d: y[x) = fa; 2 + ^x + 

18e: y(t) = t 4 - 8t 3 + 48t 2 - 192t + 385 

18g: 6(t) = — |e _t sin t— |e _t cos t 

18i: x(t) = — 4? 

Section 6.3 

lb: 

ki = 

k 2 = 

Vi+i = 

lc: 

ki = 

k 2 = 


/ h h, 

J I U +2 >3/* + 2 fcl 

2/i + hk 2 


„fh h, 

J I U + — , 2/i + — Kx 


Vi + x [ 3 *2 - fci] 


2/i+i 
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lg: 


ki = f{U,yi ) 

, ,/ h h, 

^2 — f \ti + -iUi + -k\ 

, / h h 1 

k 3 = /Ui+2’^+2 fc2 


, , . 2 h 2h 

ki = J [U + —,Vi + yfci 


Vi + 1 = 


2/i + 2 [3^2 - 4fc 3 + 3 fc 4 ] 


lj: 


fci 

fc4 


= ) 

t (. , V^5 — V3 , v"5 ^ 

/ h ft, 

— / ( ^ + 2 i Vi + 2 ^’ 2 

, V^+V^ , V^+73^ 

= — -t= — h,yi-\ :r - 7 = — hki 


2V5 


2\Jh 


Vi+i = 


2h + y ^ + 8fcs + 


2b: 0(h 2 )- Yes 
2c: 0{h 2 )] Yes 
2g: 0{h 3 ); No 
2j: 0(/i 2 ); No 
6: on page 301 
3: 


mmnmmmmmmmmmnnmmmmmmmmm 


7, Written by Leon Brin 28 May 2016 "/, 
7, Purpose: This function implements the Midpoint method where 7, 
7. the step size is calculated and held constant . 7« 
7. INPUT: function f(x,y); interval [a,b] ; y(a) ; steps n 7« 
7. OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) 7« 


function [y,x] = midpoint(f ,a,ya,b,n) 
i = 1; 
x(i) = a; 

y(i) = ya; 

h = (b-a)/n; 
while (i<=n) 

kl = f (x(i) ,y(i)) ; 
k2 = f (x(i)+h/2,y(i)+h/2*kl) ; 
y (i+1) = y (i) + h*k2; 
x(i+l) = a + (b-a)*i/n; 
i = i+1 ; 
end7«while 
end7«function 

This code may be downloaded at the companion website. 
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7: 


“/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ "/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ 
/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o 

y„ Written by Leon Brin 28 May 2016 “/« 

°/o Purpose: This function implements Ralston’s method where °/ 0 

•/. the step size is calculated and held constant . "/ 0 

°/ 0 INPUT: function f(x,y); interval [a,b] ; y(a) ; steps n °/ 0 

V. OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) °/ 0 


function [y,x] = ralston(f ,a,ya,b,n) 
i = 1; 
x(i) = a; 

y(i) = ya; 
h = (b-a)/n; 
while (i<=n) 


kl = f (x(i) ,y(i)) ; 
k2 = f (x(i)+2*h/3,y(i)+2*h/3*kl) ; 
y(i+l) = y (i) + h/4*(kl+3*k2) ; 
x(i+l) = a + (b-a)*i/n; 
i = i+1 ; 
end%while 
end“/ 0 f unction 

This code may be downloaded at the companion website. 


8c: 2.071336302192492 
9c: 2.237523715781341 
10c: 2.240722979472185 
11c: 2.235615854209425 
12c: 2.236251636584492 


Section 6.4 

lb: 0(/i 3 ); equal to that of underlying integration formula; yes, one degree higher than rate of convergence. 

lc: 0(h 3 ); equal to that of underlying integration formula; yes, one degree higher than rate of convergence. 

lg: NOTE: Since this is a four-stage method, equations 6.4.5-6.4.14 must be used to determine the rate of conver- 
gence. 0(/i 4 ); less than that of underlying integration formula; yes, one degree higher than rate of convergence. 

lj: NOTE: Since this is a four-stage method, equations 6.4.5-6.4.14 must be used to determine the rate of conver- 
gence. 0(h 3 ); less than that of underlying integration formula; yes, one degree higher than rate of convergence. 

4: eulerimp.m may be downloaded at the companion website. 


nnmmmmmmmmmmmmmmmmmmmm n 

1 Written by Leon Brin 31 May 2016 '/, 

°/ 0 Purpose: This function implements improved Euler’s method ”/ 0 

°/ 0 where the step size is calculated and held constant . ”/ 0 

V. INPUT: function f(x,y); interval [a,b] ; y(a) ; steps n “/ 0 

V. OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) “/, 

mmmmmmmmmmmmmmmmmmmmmm 


function [y,x] = eulerimp(f ,a,ya,b,n) 

i = l; 

x(i) = a; 

y(i) = ya; 

h = (b-a)/n; 
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while (i<=n) 

kl = f (x(i) ,y(i) ) ; 
k2 = f (x(i)+h,y(i) + h*kl) ; 
y (i+1) = y (i) + h/2*(kl+k2); 
x(i+l) = a + (b-a)*i/n; 
i = i+1; 
end"/ 0 while 
end"/ 0 f unction 

5: heun.m may be downloaded at the companion website. 


mmmmmmmmmmmmmmmmmmmmmm 


°/o Written by Leon Brin 31 May 2016 7. 

’/ Purpose: This function implements Heun’s third order method “/, 
°/ 0 where the step size is calculated and held constant . “/, 

7. INPUT: function f(x,y); interval [a,b] ; y(a); steps n °L 

’/ OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) 7« 
7 7. 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 

/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o 


function [y,x] = heun(f ,a,ya,b,n) 
i = 1; 
x(i) = a; 
y (i) = ya; 
h = (b-a)/n; 
while (i<=n) 

kl = f (x(i) , y (i) ) ; 
k2 = f(x(i)+h/3, y(i)+h/3*kl) ; 
k3 = f (x(i)+2*h/3, y (i)+2*h/3*k2) ; 
y (i+1) = y (i) + h/4* (kl+3*k3) ; 
x(i+l) = a + (b-a)*i/n; 
i = i+1; 
end7«while 
end7«f unction 


6: rk4.m may be downloaded at the companion website. 


nmmmmmmmmmmmmmmmmmmnnmm 


°/„ Written by Leon Brin 1 June 2016 °L 
°/o Purpose: This function implements Runge-Kutta 4th order (RK4) 7o 
°/ 0 where the step size is calculated and held constant . 7o 
’/ INPUT: function f(x,y); interval [a,b] ; y(a); steps n 7o 
’/ OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) 7« 

mmmmmmmmmmmmmmmmmmmmmm 


function [y,x] = rk4(f ,a,ya,b,n) 

i = 1; 

x(i) = a; 

yd) = ya; 

h = (b-a)/n; 

while (i<=n) 

kl = f (x(i) , y (i) ) ; 

k2 = f(x(i)+h/2, y(i)+h/2*kl) ; 

k3 = f(x(i)+h/2, y(i)+h/2*k2) ; 

k4 = f (x(i)+h, y(i)+h*k3) ; 

y (i+1) = y (i) + h/6* (kl+2*k2+2*k3+k4) ; 

x(i+l) = a + (b-a)*i/n; 

i = i+1; 

end7«while 
end7»f unction 
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Section 6.5 

1: One way to code it would be the following. rk23.m may be downloaded at the companion website. 


mmmmmmmmmmmmmmmmnnnmmmm 

7 . Written by Leon Brin 31 May 2016 7 , 

7 . Purpose: This function implements an adaptive rk2(3) method 7 « 
7 . where the step size is controlled by the routine. 7 . 

7 . Heun’s third order method is combined with open-ode. 7 « 

7 . INPUT: function f(x,y); interval [a,b] ; y(a) ; initial step 7 « 

7 . size h; tolerance eps; maximum steps N; 7 « 

7 . OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) 7 . 

mmmmmmmmmmmmmmmmnnnmmmm 


function [y,t] = rk23(f ,a,ya,b,h,eps,N) 
i = l; 

t(i) = a; 

y(i) = ya; 

done = 0; 

while ( ! done kk i<=N) 

if ( (b-t (i) -h) * (b-a) <=0) 


h=b-t (i) ; 
done = 1 ; 
endif 


kl = f (t(i) , y(i)) ; 
k2 = f(t(i)+h/3, y(i)+h/3*kl) ; 
k3 = f (t(i)+2*h/3, y(i)+2*h/3*k2) ; 
err = abs (h/4* (kl-2*k2+k3) ) ; 
if (done I I err<=eps) 

y(i+l) = y(i) + h/4*(kl+3*k3) ; 
t(i+l) = t(i) + h; 
if (t(i+l) == t(i) ) 

disp( "Procedure failed. Step size reached zero.") 
return 
endif 


i = i+1; 
endif 

q = 0 . 9*realpow(eps/err , 1/3) ; 
q = max(q, 0 . 1) ; 
q = min(5.0,q) ; 
h = q*h; 
end7«while 
if (Idone) 

disp( "Procedure failed. Maximum number of iterations reached.") 
endif 

end7«f unction 


2: (a) and (d). 

12b: The Butcher tableau is 



14: One way to code it would be the following, merson.m may be downloaded at the companion website. 
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“/ “/ “/ 7. “/ “/ 7. “/ “/ “/ “/ “/ “/ 7. 7. 7. “/ “/ “/ “/ 7. “/ “/ 7. 7. 7. “/ “/ “/ “/ “/ “/ 7. “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ “/ 

/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o/o 

7. Written by Leon Brin 9 June 2016 "/« 

7. Purpose: This function implements the method of Merson (1957) 7, 

7» where the step size is controlled by the routine. 7« 

7. INPUT: function f(x,y); interval [a,b] ; y(a); initial step 7o 

7« size h; tolerance eps; maximum steps N; 7« 

7c OUTPUT: approximation (x(i),y(i)) of the solution of y’=f(x,y) 7. 
7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c7c 


function [y,t] = merson(f , a,ya,b ,h, eps ,N) 

i = 1; 

t(i) = a; 

y(i) = ya; 

done = 0; 

while ( ! done kk i<=N) 

if ((b-t(i)-h)*(b-a)<=0) 
h=b-t (i) ; 
done = 1 ; 
endif 

kl = f (t(i) , y (i) ) ; 

k2 = f(t(i)+h/3, y(i)+h/3*kl) ; 

k3 = f(t(i)+h/3, y(i)+h/6*(kl+k2)) ; 

k4 = f(t(i)+h/2, y(i)+h/8*(kl+3*k3)) ; 

k5 = f (t(i)+h, y(i)+h/2*(kl-3*k3+4*k4)) ; 

err = abs (h/30* (2*kl-9*k3+8*k4-k5) ) ; 

if (done I I err<=eps) 

y(i+l) = y (i) + h/6* (kl+4*k4+k5) ; 
t(i+l) = t(i) + h; 
if (t (i+1) == t (i) ) 


dispC'Procedure failed. Step size reached zero. 11 ) 
return 
endif 


i = i+1; 
endif 

q = 0 . 9*realpow(eps/err , 1/4) ; 
q = max(q,0. 1) ; 
q = min(5 . 0 , q) ; 
h = q*h; 
end7«while 
if (Idone) 

dispC'Procedure failed. Maximum number of iterations reached. 11 ) 
endif 

end7«f unction 


15d: As can be seen from the diagram, most tolerances greater than 10 -4 do not produce a global error of 10 -4 
or less, though there are exceptions. If just guessing and checking, likely you will end up with a tolerance of 
5(10) -5 or less. 
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Cash-Karp 



15f: As can be seen from the diagram, most tolerances less than 10 3 produce a global error of 10 
some greater tolerances. 


or less, as do 


RK2(4) 



16d: As can be seen from the diagram, tolerances less than 10 4 produce a global error of 10 4 or less, as do some 
slightly higher tolerances. 
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Cash-Karp 



tolerance 

16f: As can be seen from the diagram, most tolerances less than about 5(10) -3 produce a global error of 10~ 4 or 
less, as do some slightly greater tolerances. 


RK2(4) 



19: (a) y(5) ~ 6.40926980783945; error ss 1.75(10) 4 , 75% greater than the tolerance, (b) y{ 5) ~ 6.40708478227220; 
error ss 2.36(10) — 3 , nearly 24 times the tolerance, (c) y( 5) « 6.40937679658180; error ss 6.82(10) -5 , about 
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68% of the tolerance, (d) y( 5) ~ 6.40885618182156; error ss 5.88(10) 4 , nearly 6 times the tolerance. 

20 : In order from most to least efficient: Cash-Karp, Merson, RK2(3), Bogacki-Shampine, with evaluations 42, 50, 
69, and 138, respectively. 
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3/8-rule Runge-Kutta method, 234 

accuracy, 1, 6 

significant digits of, 24 

adaptive quadrature, see numerical integration, adaptive 
adaptive Runge-Kutta method, 227, 234 
pseudo-code, 230 
adaptive Simpson’s rule 
code, 331 

Aitken’s delta-squared method, 58, 59 


Bernstein polynomial, 110 
bisection method, 37 
analysis, 40 
pseudo-code, 39 
Bode’s rule, 156 
Bogacki-Shampine method, 235 
bracketed inverse quadratic interpolation, 98 
Octave code, 96 

bracketed Newton’s method, 91, 98 
Octave code, 92 

bracketed secant method, 91, 98 
Octave code, 92 
bracketing, 91, 98 
pseudo-code, 93 
Brent’s method, 94 
Butcher tableau, 231, 234 

Cardano 

cubic formula of, 69 
Cash-Karp method, 235, 236 
composite trapezoidal rule 
code, 281 
convergence 

order of, 19-21, 24 
rate of, 22-24 
superlinear, 56, 61 
superquadratic, 61 
convergence diagram, 60 
cubic formula, 68 

deflation, 83, 88 
differential equation, 195 

approximate solution, 196 
degree, 195 
ordinary, 195 


second order, 204 
solution, 196 
stiff, 232 

divided difference, 118, 119, 121, 123 
division 

synthetic, 71, 82 

embedded Runge-Kutta method, 234 
error, 1 

absolute, 1, 5 
algorithmic, 1, 3, 6 
floating-point, 1, 3, 6 
relative, 1, 5 
round-off, 6 
truncation, 6 
error checking, 63 
Euler’s method, 202, 205, 234 
code, 301 
pseudo-code, 203 
explicit trapezoidal method, 222 

false position, see bracketed secant method 
fixed point, 46 
attractive, 53 
repulsive, 53 

fixed point iteration method, 46, 53 
analysis, 56 
pseudo-code, 53 
floating-point arithmetic, 2, 6 
force 

applied, 196 
compression, 195 
drag, 194, 195 
frictional, 196 
gravitational, 194, 195 
normal, 195 
spring, 195 
tension, 194, 195 
free body diagram, 194 

Galileo, 193, 194 
Golomb 

Solomon, 31 

Heun 

Karl, 222, 227 
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Heun’s third order method, 222, 227 
code, 343 

Horner’s method, 82, 88 
code, 326 
pseudo-code, 84 
Huygens, Christiaan, 193, 194 

implicit Runge-Kutta method, 232 
improved Euler method, 222, 234 
code, 342 

initial value problem, 196, 197 
interpolating function, 106, 114 
interpolating polynomial, 114 
inverse quadratic interpolation method, 94, 98 
order of convergence, 95 
iteration, 46 

Kutta 

Martin, 222 

Lagrange form, 107, 114 
Lorenz, Edward, 4 

Muller’s method, 86, 88 

order of convergence, 87 
Maclaurin polynomial, 13 
Maxima, 141 
Merson method, 235 
code, 344 
method 

3/8-rule Runge-Kutta, see 3/8-rule Runge-Kutta method 
adaptive Runge-Kutta, see adaptive Runge-Kutta 
method 

Aitken’s delta-squared, see Aitken’s delta-squared 
method 

bisection, see bisection method 
Bogacki-Shampine, see Bogacki-Shampine method 
bracketed inverse quadratic interpolation, see brack- 
eted inverse quadratic interpolation 
bracketed Newton’s, see bracketed Newton’s method 
bracketed secant method, see bracketed secant method 
Brent’s, see Brent’s method 
Cash-Karp, see Cash-Karp method 
embedded Runge-Kutta, see embedded Runge-Kutta 
method 

Euler’s, see Euler’s method 

explicit trapezoidal, see explicit trapezoidal method 
false position, see bracketed secant method 
fixed point iteration, see fixed point iteration method 
Heun’s third order, see Heun’s third order method 
Horner’s, see Horner’s method 
implicit Runge-Kutta, see implicit Runge-Kutta method 
improved Euler, see improved Euler method 
inverse quadratic interpolation, see inverse quadratic 
interpolation method 
Muller’s, see Muller’s method 
Merson, see Merson method 
midpoint, see midpoint method 


modified Euler, see modified Euler method 
Neville’s, see Neville’s method 
Newton’s, see Newton’s method 
Ralston’s, see Ralston’s method 
regula falsi, see bracketed secant method 
RK4, see RK4 method 
Runge-Kutta, see Runge-Kutta method 
secant, see secant method 
seeded secant, see seeded secant method 
Sidi’s, see Sidi’s method 
Steffensen’s, see Steffensen’s method 
Taylor’s, see Taylor’s method 
midpoint method, 213 
code, 341 
midpoint rule, 156 
modified Euler method, 213 

Neville’s method, 111, 115 
Octave code, 114 
pseudo-code, 113 
Newton 

second law of motion, 194 
Newton form, 117, 118, 123 
Newton’s method, 65, 66, 71, 86 
pseudo-code, 66 
node, 128, 134 

numerical differentiation, 132, 137 
numerical integration, see also quadrature, 133, 139 
adaptive, 162, 164 
composite, 161, 164 
Romberg, 172 

o.d.e., see differential equation, ordinary 
Octave 
7., 27 

.m file, 15, 32 
arithmetic operations, 6 
array, 25 

boolean operators, 41 
comments, 27 
comparison, 41 
constants, 8 
custom functions, 31 
disp, 26 
end, 25 
for loop, 24 
format, 6 

if then [else] , 40 
inline function, 15 
length of an array, 25 
recursive function, 33 
sprintf, 209 
standard functions, 6 
while loop, 61 

pendulum, 193-195 

7 T 


approximation, 23 
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polynomial 

finding all roots, 83 
Maclaurin, 13 
Taylor, 10, 13 

polynomial approximation, 134 
potential leading coefficient, 117, 123 
precision 

degree of, 149, 152 

quadratic formula 
alternate, 85 

quadrature, see also numerical integration, 152 
Gaussian, 149, 152 

Ralston’s method, 213 
code, 342 
Ramanujan 

Srinivasa, 23 
recursion, 29 

regula falsi, see bracketed secant method 
Richardson’s extrapolation, 168 
RK2(3) method, 230 
code, 344 
RK3(4) method 
code, 317 

RK4 method, 222, 223 
code, 343 

Romberg integration, 172 
code, 333 
Runge 

Carl, 222 

Runge-Kutta method, 207, 217, 227 

secant method, 67, 71 
analysis, 67 
pseudo-code, 70 
seeded secant method, 70, 71 
pseudo-code, 70 
separation of variables, 198 
Sidi’s method, 111, 115, 119 
Octave code, 121 
pseudo-code, 119 
Simpson’s rule, 156 
Simpson’s | rule, 156 
Steffensen’s method, 59, 61 
code, 328 
pseudo-code, 60 
stencil, 131, 134 
stopping criterion 

for root finding, 97 
synthetic division, 82, 88 


Taylor’s method of degree 3 
pseudo-code, 203 
Taylor’s method of order 2 
code, 339 
pseudo-code, 339 
Theorem 

Fixed Point Convergence, 48, 53 
Fixed Point Error Bound, 56, 60 
Fundamental Factorization, 83 
Generalized Rolle’s, 114 
Intermediate Value, 40 
Mean Value, 53 
of Algebra, Fundamental, 83 
Rational Roots, 71 
Rolle’s, 13 
Taylor’s, 10, 13, 14 
Taylor’s two variable, 217, 225 
Weighted Mean Value, 152, 275 
trapezoidal rule, 156 
adaptive, 162 

adaptive, pseudo-code, 164 
composite, 161 
composite, pseudo-code, 162 
trominos, 30 

undetermined coefficients, 137, 144, 206 

validation, 63 

web diagram, 47 
wxMaxima, 141 


Taylor 

Brook, 14 
error term, 11 
polynomial, 10, 13 
remainder term, 10 
Taylor’s method, 201, 205 


